Dynamic Programming: convergence theorems

304 Views Asked by At

Sorry my ignorance in advance, but I've failed to find the answer to this question both in Bellman's book and Google.

I have the following optimization problem (similar to Markov decision process), which I want to numerically solve using dynamic programming approach:

\begin{equation} \pi(s) = \arg \min_{a} \int ds' p_a(s,s')\left(R_a(s,s') + \gamma V(s')\right), \end{equation} \begin{equation} V(s) = \int ds' p_{\pi(s)}(s,s')\left(R_{\pi(s)}(s,s') + \gamma V(s')\right), \end{equation}

where $s$ is continuous state. $R_a(s,s')$ — known transition cost from state $s'$ to state $s$ under action $a$. $\gamma$ is a constant close to $1$, $V(s)$ is a cost function $\mathbf{R} \rightarrow \mathbf{R}$. $\pi(s)$ — optimal policy function $\mathbf{R} \rightarrow A$, where $A$ is finite action space.

It is known that the solution $V(s)$ and $\pi(s)$ exists. However, to solve the problem numerically one needs to operate with discrete state space. Instead of state variable $s$ there would be a finite state grid $\hat s_i$ defined with some step $\Delta s$ and $K$ nodes. Also, instead of $V(s)$, there would be a vector of $\hat V_i$ ($V(s)$ calculated on discrete grid $\hat s_i$ in essence).

So, the following problem is solved:

\begin{equation} \hat \pi_i = \arg \min_{a} \sum_{j} p_a(\hat s_i,\hat s_j)\left(R_a(\hat s_i,\hat s_j) + \gamma \hat V_j\right), \end{equation} \begin{equation} \hat V_i = \sum_{j} p_{\hat \pi_i}(\hat s_i,\hat s_j)\left(R_{\hat \pi_i}(\hat s_i,\hat s_j) + \gamma \hat V_j\right), \end{equation}

The thing that I missed is the following. There must be a convergence theorem which states that $\hat V_i \rightarrow V(\hat s_i)$ as $K \rightarrow \infty $ and $\hat \pi_i \rightarrow \pi(\hat s_i)$ as $K \rightarrow \infty$. In other simple words, I am looking for prove why discrete numerical solution can be used instead of read continuous solution.

Could you please provide a reference to this theory, please?

1

There are 1 best solutions below

0
On BEST ANSWER

Required theorem is available in the following paper:

C.S. Chow and J.N. Tsitsiklis. An optimal one-way multigrid algorithm for discrete- time stochastic control. IEEE Transactions on Automatic Control , 36:898ñ914, 1991.

http://web.mit.edu/jnt/www/Papers/J033-91-chow-mgrid.pdf