Why does $\|\nabla f(x) - \nabla f(y)\| \leq L \|x-y \|$ imply $\| \nabla^2 f(x) \| \leq L$?

194 Views Asked by At

Let $f$ a twice continuously differentiable function. Also let $\nabla f$ be Lipschitz continuous with Lipschitz constant $L$, i.e.,

$$\|\nabla f(x) - \nabla f(y)\| \leq L \|x-y \|.\tag{1}$$

On page 25 of Nesterov's Lectures on Convex Optimization, it is stated that for any $s \in \mathbb{R}^n$ and any $a >0$, we have

$$\left\|s^T\left( \int_0^a \nabla^2 f (x + t s)dt \right) \right\| = \|\nabla f(x +a s) -\nabla f(x) \| \leq a L \| s \|\tag{2}$$

which when divided by $a$ and $a \to 0$ we obtain

$$\| \nabla^2 f(x) \| \leq L.\tag{3}$$

Thanks to @Oliver Díaz`'s comment, to prove $(2) \Rightarrow (3)$, we define $\phi(t)=\nabla f(x+ts)$. Then using $\frac{\partial \phi(t)}{\partial t} = s^T \nabla^2 f(x+ts)$ we have

$$\|\nabla f(x +a s) -\nabla f(x) \| = \|\phi(a) - \phi (0)\| = \left\|\int_0^a \frac{\partial \phi(t)}{\partial t} dt \right\| = \left\|\int_0^a s^T \nabla^2 f(x+ts) dt\right\|,$$ which by applying $(1)$ we get

$$\|\nabla f(x +a s) -\nabla f(x) \| \leq a L \| s\|.$$ Combining the latter two results we get $(2)$. Then, dividing by $a$, as the author in the book suggests, we get

$$\frac{1}{a}\left\|s^T\left( \int_0^a \nabla^2 f (x + t s)dt \right) \right\| = \frac{1}{a}\|\nabla f(x +a s) -\nabla f(x) \| \leq L \| s \|\tag{4}.$$ Could you please someone explain how we get $\|\nabla^2 f(x)\| \leq L $ by computing the limit of $(4)$ as $a\to 0$.

1

There are 1 best solutions below

4
On

\begin{align} &\frac{1}{a} \left\|s^\top \int_0^a \nabla^2 f(x+ts) \, dt \right\| \\ &= \frac{1}{a} \left\|s^\top \int_0^a (\nabla^2 f(x+ts) - \nabla^2 f(x) + \nabla^2 f(x)) \, dt \right\| \\ &\ge \|s^\top \nabla^2 f(x)\| - \frac{1}{a} \left\|s^\top \int_0^a (\nabla^2 f(x+ts) - \nabla^2 f(x)) \, dt \right\|. \end{align}

Since $f$ is twice continuously differentiable, the second term can be made smaller than any given $\epsilon > 0$ by choosing $a$ close enough to zero.

Thus we have $$\|s^\top \nabla^2 f(x)\| - \epsilon \le L \|s\|$$ for any $\epsilon > 0$.


Details about the vanishing term:

We have \begin{align} \frac{1}{a} \left\|s^\top \int_0^a (\nabla^2 f(x+ts) - \nabla^2 f(x)) \, dt \right\| \le \frac{1}{a} \int_0^a \|s^\top (\nabla^2 f(x+ts) - \nabla^2 f(x))\| \, dt. \end{align} Continuity of the second derivative of $f$ should imply that for any $\epsilon > 0$ there exists $a$ such that $\|s^\top (\nabla^2 f(x+ts) - \nabla^2 f(x))\| < \epsilon$ for any $s \in [0,a]$. Then the right-hand side can be bounded by $\epsilon$.