Contraction of Bellman Operator under general $L_p$ norms

507 Views Asked by At

We know that the Bellman Operator $$ TV(s) = \max_a r(s,a) + \sum_{s' \in S}p(s'|s,a)V(s') $$ is a contraction under $L_\infty$ norm.For reference one can see the following link Proof that Bellman update is a contraction
A couple of definitions are warranted: $p(s'|s,a)$ is the probability of hitting state $s'$ under action $a$ at state $s$. $S$ is the set of all possible states. $r(s,a)$ is the reward, incurred at state $s$ when playing action $a$. $V(s')$ is the reward incuured at state $s'$ over an entire horizon when optimal actions are chosen at each subsequent state.
I was wondering what was stopping it from being a contraction in any $L_p$ norm. Ofcourse in finite dimensions it should not be a problem because all norms are equivalent in finite dimensions, so I guess the only interesting thing is in infinite dimensions.
My idea is that $$ \begin{split}\|TV - T\bar{V}\|_p& = \bigg(\sum_s\big(\sum_{s'}p(s'|s,a^*)(V(s')-\bar{V}(s'))\big)^p\bigg)^{1/p}\\ &\leq \sum_{s'}p(s'|s,a^*)\bigg(\sum_s\bigg(V(s')-\bar{V}(s')\bigg)^p\bigg)^{1/p} \end{split}$$ by Jensen Inequality and then $\sum_{s'}p(s'|s,a) = 1$ to get $$ \|TV - T\bar{V}\|_p \leq \|V - \bar{V}\|_p. $$ What is the mistake?

1

There are 1 best solutions below

0
On BEST ANSWER

So I found my error. My definition of Jensen's was wrong.

The correct inequality would be in terms of $L_2$ norm

$$\|TV - T\bar{V}\|_2 \leq \sqrt{|S|}\|V-\bar{V}\|_2$$ and hence obviously cannot be a contraction even in finite dimensions