Jensen's inequality tells us variation of $x$ will increase the average value of $f(x)$?

178 Views Asked by At

This is from Boyd's convex optimization 6.4.1 stochastic robust approximation (p. 319):

"When the matrix $A$ is subject to variation, the vector $Ax$ will have more variation the larger $x$ is, and Jensen's inequality tells us that variation in $Ax$ will increase the average value of $\|Ax -b\|_2$."

I'm confused about "Jensen's inequality tells us that variation in $Ax$ will increase the average value of $\|Ax -b\|_2$". Taking $Ax$ as one variable, I think this statement can be expressed similarly as "larger $\textbf{var}(z) \Rightarrow \text{larger} E f(z)$, where $f$ is convex".

Jensen's inequality tells us $E f(z) \geq f(E(z))$, from which we can get the sense of larger $E(z)$ brings larger $E f(z)$. However, how can we connect to $\textbf{var}(z)$ instead?

1

There are 1 best solutions below

0
On BEST ANSWER

You cannot connect to variance in general. For example if a random vector $X$ varies but is always in the nullspace of a (potentially random) matrix $A$, then $AX=0$ always and any function of $AX$ is constant (having variance 0), even though both $A$ and $X$ might vary with nonzero measures of variation. [You can easily compare that to examples with new matrix $\tilde{A}$ and/or new vector $\tilde{X}$, where both $\tilde{A}$ and $\tilde{X}$ have smaller measures of variation than their counterparts $A$ and $X$, but $\tilde{A}\tilde{X}$ is nonconstant.]

In the special case when $f:\mathbb{R}^n\rightarrow\mathbb{R}$ is $c$-strongly convex (for some $c>0$) and $X\in \mathbb{R}^n$ is a random vector with finite mean $m=E[X]$ then you can connect to variance

\begin{align} &f(X) \geq f(m) + f'(m)^{\top}(X-m) + \frac{c}{2}||X-m||^2 \\ &\implies E[f(X)] \geq f(m) + \frac{c}{2}\underbrace{E[||X-m||^2]}_{\sum_{i=1}^n Var(X_i)} \end{align} where $f'(m)$ is a subgradient of $f$ at $m$, which exists because $f$ is convex and $m$ is interior to $\mathbb{R}^n$. So the lower bound on the Jensen inequality gap grows with increasing $\sum_{i=1}^n Var(X_i)$.


When Boyd is saying it "will increase the average value" he is just giving an intuitive interpretation of Jensen's inequality for a convex function $f$. He is not trying to state any new theorem. More precise language would replace "will increase" with "will not decrease," meaning if you compare $E[f(X)]$ with $f(E[X])$ then one will always be at least as large as the other. Note also that he is just comparing $E[f(X)]$ and $f(E[X])$; he is not comparing $E[f(X)]$ and $E[f(\tilde{X})]$ for some other non-constant $\tilde{X}$.