Solving MDP Value function with large state using Neural Networks

18 Views Asked by Bumbble Comm At 11 May 2026 - 1:55

I have a KNOWN MDP tuple {S,A,P,R} represent user types arrivals.

I KNOW all of those parameters. I know the reward function and the transition probabilities. I have the forumla of the value function:

$V((s_0,...,s_n))=\frac{1}{2n}* max[V(s_0,...,s_n), V(s_0+1,...,s_n)+5] +\frac{1}{2n} * max[V(s_0,...,s_n+1), V(s_0,...,s_n)+5] + \frac{1}{2n} * V(s_0-1,...,s_n)+\frac{1}{2n} * V(s_0,...,s_n-1) $

The final value function for each state can be calculated by Dynamic Programming. As you see, that state is big, and I cannot traverse it all. All the reinfocement learning algorithms refer to interacting with environment. Here I dont have interaction, but a value function that I want to compute.

How can I use neural networks and traverse partial of the state space?

Original Q&A

Solving MDP Value function with large state using Neural Networks

Related Questions in MARKOV-CHAINS

Related Questions in MARKOV-PROCESS

Related Questions in NEURAL-NETWORKS

Related Questions in DYNAMIC-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions