Convergence of value iteration of dynamic programming

73 Views Asked by At

$\newcommand{\R}{\mathbb{R}}\newcommand{\T}{\mathbb{T}}$Let $\ell:\R^n\times\R^m\to[0, \infty)$ be a continuous stage cost function and $f:\R^n\times\R^m\to\R^n$ be a continuous function associated with the system

$$ x_{t+1} = f(x_t, u_t). $$

Let $V_f:\R^n\to[0, \infty)$ be a continuous terminal cost and define

$$ V_N^\star(x) = \min_{(u_t)_{t=0}^{N-1}}\left\{ \sum_{t=0}^{N-1} \ell(x_t, u_t) + V_f(x_N) \,{}\Big|{}\, x_{t+1} = f(x_t, u_t), t=0,\ldots, N-1 \right\}, $$

and define $V_\infty^\star$ analogously:

$$ V_\infty^\star(x) = \min_{(u_t)_{t=0}^{\infty}}\left\{ \sum_{t=0}^{\infty} \ell(x_t, u_t) \,{}\Big|{}\, x_{t+1} = f(x_t, u_t), t\in\mathbb{N} \right\}, $$

The dynamic programming (Bellman) operator for a function $V:\R^n\to[0, \infty)$ is defined as

$$ (\T{}V)(x) = \min_{u}\{\ell(x, u) + V(f(x, u))\}, $$

assuming that the minimum is attained.

The value iteration is:

$$ V_{t+1}^\star(x) = \T V_t^\star(x), $$

starting with $V_{t}^\star(x) = V_f(x)$.

Question: I would like to show the following: suppose that the minimum in $V_\infty^\star$ is attained and $V_0^\star \leq V_\infty^\star$. Then, $V_\infty^\star(x) = \lim_{t\to\infty}V_t^\star(x)$.

In the simple case: $V_0^\star(x) = 0$ we have that $$ V_1^\star(x) = \T V_0^\star(x) = \min_{u}\{\ell(x, u) + 0\} \geq 0 = V_0^\star(x). $$ Inductively we can show that $V_0^\star(x) \leq V_1^\star(x) \leq \ldots V_t^\star(x) \leq V_\infty^\star(x)$, that is, the sequence is monotonically nondecreasing and upper bounded, therefore it converges to a limit that satisfies Bellman's equation and the corresponding minimizer defines an optimal stationary policy.

However, it $V_0^\star \leq V_\infty^\star$, I have only been able to show the following:

$$ V_0^\star \leq V_\infty^\star \Rightarrow \T V_0^\star \leq \T V_\infty^\star \Leftrightarrow V_1^\star \leq V_\infty^\star. $$

Inductively, $V_t^\star \leq V_\infty^\star$, however it is not clear to me why/whether $V_t^\star$ converges.