Proof that Bellman Optimality Operator converges

745 Views Asked by At

Let $T^*$ be the Bellman Optimality Operator defined as follow:

$$(T^*Q)(s,a) = R(s,a) + \gamma \max_{a' \in A} \sum_{s' \in S} P(s'|s,a)Q(s',a')$$

with $0 \leq P(s'|s,a) \leq 1$ and $\sum_{s' \in S} P(s'|s,a) = 1$.

I can proove that $T^*$ is a max-norm contraction, in other word (if I get it correctly)

$$\lVert T^*Q - T^*Q'\rVert \leq \lVert Q - Q'\rVert $$

And in this way, it is possible to say that:

$$\lVert Q_{t+2} - Q_{t+1}\rVert \leq \lVert Q_{t+1} - Q_{t}\rVert$$

provided that

$$Q_{t+1} = T^*Q_{t}$$

In this way I can only show that, iterating over the bellman operator, the difference between $Q_{t+1}$ and $Q_t$ get lower (in module), but it doesn't mean that $\lim_{t \to \infty} \lVert Q_{t+1} - Q_t \rVert = 0$. And I'm interested in showing this last sentence.

P.S. $Q_0 = 0$