We know that the Bellman Operator
$$
TV(s) = \max_a r(s,a) + \sum_{s' \in S}p(s'|s,a)V(s')
$$
is a contraction under $L_\infty$ norm.For reference one can see the following link Proof that Bellman update is a contraction
A couple of definitions are warranted:
$p(s'|s,a)$ is the probability of hitting state $s'$ under action $a$ at state $s$. $S$ is the set of all possible states. $r(s,a)$ is the reward, incurred at state $s$ when playing action $a$. $V(s')$ is the reward incuured at state $s'$ over an entire horizon when optimal actions are chosen at each subsequent state.
I was wondering what was stopping it from being a contraction in any $L_p$ norm. Ofcourse in finite dimensions it should not be a problem because all norms are equivalent in finite dimensions, so I guess the only interesting thing is in infinite dimensions.
My idea is that
$$
\begin{split}\|TV - T\bar{V}\|_p& = \bigg(\sum_s\big(\sum_{s'}p(s'|s,a^*)(V(s')-\bar{V}(s'))\big)^p\bigg)^{1/p}\\
&\leq \sum_{s'}p(s'|s,a^*)\bigg(\sum_s\bigg(V(s')-\bar{V}(s')\bigg)^p\bigg)^{1/p}
\end{split}$$ by Jensen Inequality
and then $\sum_{s'}p(s'|s,a) = 1$
to get
$$
\|TV - T\bar{V}\|_p \leq \|V - \bar{V}\|_p.
$$
What is the mistake?
2026-03-26 06:29:09.1774506549
Contraction of Bellman Operator under general $L_p$ norms
507 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
So I found my error. My definition of Jensen's was wrong.
The correct inequality would be in terms of $L_2$ norm
$$\|TV - T\bar{V}\|_2 \leq \sqrt{|S|}\|V-\bar{V}\|_2$$ and hence obviously cannot be a contraction even in finite dimensions