Solving MDP Value function with large state using Neural Networks

15 Views Asked by At

I have a KNOWN MDP tuple {S,A,P,R} represent user types arrivals.

I KNOW all of those parameters. I know the reward function and the transition probabilities. I have the forumla of the value function:

$V((s_0,...,s_n))=\frac{1}{2n}* max[V(s_0,...,s_n), V(s_0+1,...,s_n)+5] +\frac{1}{2n} * max[V(s_0,...,s_n+1), V(s_0,...,s_n)+5] + \frac{1}{2n} * V(s_0-1,...,s_n)+\frac{1}{2n} * V(s_0,...,s_n-1) $

The final value function for each state can be calculated by Dynamic Programming. As you see, that state is big, and I cannot traverse it all. All the reinfocement learning algorithms refer to interacting with environment. Here I dont have interaction, but a value function that I want to compute.

How can I use neural networks and traverse partial of the state space?