Smoothness inequality over three points

89 Views Asked by At

A few years ago, I had taken note that if a convex function $f:\mathbb{R}^n\to\mathbb{R}$ has Lipschitz continuous gradient $\nabla f(x)$ with modulus $L$, then $$\langle \nabla f(x_1) - \nabla f(x_2), x_2 - x_3\rangle \leq \frac{L}{4}||x_1 - x_3||_2^2$$ for any $x_1,x_2,x_3\in\mathbb{R}^n$. How can I prove this inequality? Also, is it possible to obtain a lower bound to the inner product in terms of distance $||x_1 - x_3||_2^2$ if the function is also $m$-strongly convex?

Edit: when $f(x) = \frac{1}{2}||x||^2$, the inequality holds.

3

There are 3 best solutions below

0
On BEST ANSWER

After mvkbr's post, I tried to find the book cited by Antipin, but I could not find an English version. Then, I looked at a few other books by Russian scholars and finally found the answer in Section 7.2.1 of Polyak's Introduction to Optimization. I share the proof for completeness.

The proof is based on the co-coercivity of gradient: \begin{align} \langle \nabla f(x_1) - \nabla f(x_2),\ x_3 - x_2\rangle &= \langle \nabla f(x_1) - \nabla f(x_2),\ x_1 - x_2\rangle + \langle \nabla f(x_1) - \nabla f(x_2),\ x_3 - x_1\rangle \\ &\geq \frac{1}{L}||\nabla f(x_1) - \nabla f(x_2||^2 + \langle \nabla f(x_1) - \nabla f(x_2),\ x_3 - x_1\rangle \\ &\geq -\frac{L}{4}||x_3 - x_1||^2 \end{align} where the second inequality follows from the identity $$ ||a||^2 + \langle a,\ b\rangle \geq -\frac{1}{4}||b||^2 \qquad \forall a,b. $$ Negating both sides completes the proof.

If the function is also strongly convex with modulus $m$, then one can also obtain $$ \langle \nabla f(x_1) - \nabla f(x_2),\ x_2 - x_3\rangle \leq \frac{1}{4m}||\nabla f(x_1) - \nabla f(x_3)||^2 $$ using a similar proof technique.

5
On

Since $f$ is convex, we have the following inequalities $$ \begin{aligned} \nabla f(x_1)(x_2-x_1) & \le f(x_2) - f(x_1),\\ \nabla f(x_2)(x_3-x_2) & \le f(x_3) - f(x_2),\\ \nabla f(x_3)(x_1-x_3) & \le f(x_1) - f(x_3). \end{aligned} $$ Adding them results in $$ \nabla f(x_1)(x_2-x_1) + \nabla f(x_2)(x_3-x_2) + \nabla f(x_3)(x_1-x_3)\le 0. $$ So that $$ ( \nabla f(x_1) - \nabla f(x_2) ) (x_2-x_3) \le \nabla f(x_1)(x_1-x_3) -\nabla f(x_3)(x_1-x_3) \le L \|x_1-x_3\|^2. $$ Which is not quite the desired inequality.


Since the gradient is Lipschitz, we have the following stronger inequalities: $$ \begin{aligned} \nabla f(x_1)(x_2-x_1) + \frac1{2L} \|\nabla f(x_1) - \nabla f(x_2)\|^2 & \le f(x_2) - f(x_1),\\ \nabla f(x_2)(x_3-x_2) + \frac1{2L} \|\nabla f(x_2) - \nabla f(x_3)\|^2 & \le f(x_3) - f(x_2),\\ \nabla f(x_3)(x_1-x_3) + \frac1{2L} \|\nabla f(x_3) - \nabla f(x_1)\|^2 & \le f(x_1) - f(x_3), \end{aligned} $$ see this answer. For these gradients on the left-hand side we have $$ \frac1{2L} \|\nabla f(x_1) - \nabla f(x_2)\|^2 + \frac1{2L} \|\nabla f(x_2) - \nabla f(x_3)\|^2 + \frac1{2L} \|\nabla f(x_3) - \nabla f(x_1)\|^2 \\ \ge \frac 3{4L} \|\nabla f(x_3) - \nabla f(x_1)\|^2. $$ Proceeding as above, we get $$ ( \nabla f(x_1) - \nabla f(x_2) ) (x_2-x_3) + \frac 3{4L} \|\nabla f(x_3) - \nabla f(x_1)\|^2 \\ \le (\nabla f(x_1) -\nabla f(x_3))(x_1-x_3) \\ \le \frac 3{4L} \|\nabla f(x_3) - \nabla f(x_1)\|^2 + \frac L3 \|x_1-x_3\|^2, $$ which results in $$ ( \nabla f(x_1) - \nabla f(x_2) ) (x_2-x_3)\le\frac L3 \|x_1-x_3\|^2. $$ Still, not the correct constant. If $\|x_1-x_3\| \le \|\nabla f(x_1) - \nabla f(x_3)\|$, i.e., $f$ is $1$-strongly convex, then we get the estimate with $L/4$ instead of $L/3$.


A lower bound of this quantity cannot be written in terms of $\|x_1-x_3\|$: take $f(x) = \frac12x^2$. Then for $x_2 \to +\infty$, the term in question tends to $-\infty$.

1
On

This inequality is used in A.S. Antipin's "On Finite Convergence of Processes to a Sharp Minimum and to a Smooth Minimum with a Sharp Derivative" paper, though I do not know the proof.