L-Lipschitz continious gradient f implies bounded norm of hessian

311 Views Asked by At

i am reading a paper which says that if

  1. a function $f: \mathbb{R}^d\to \mathbb{R}$ is twice continiously differentiable

and

  1. the gradient of $f$ i.e. $\nabla f$ ist L-Lipschitz-continious

then

the Hessian of $f$ has a bounded norm i.e. $||\nabla^2 f(x)||\leq L$ $\forall x$

i am not quite sure whether this is true; it is explicitly not given, that $f$ is convex

2

There are 2 best solutions below

0
On BEST ANSWER

This follows from the definition of the derivative.

In one dimension, you have $$ f'(x + h) = f'(x) + h f''(x) + o(h), \quad \mbox{as}~h \to 0. $$ This implies upon rearrangement that $|f''(x)| \leq L$, since $|f'(x + h) - f'(x)| \leq L h$.


In multiple dimensions, consider $f \colon \mathbb{R}^n \to \mathbb{R}$.

We similarly have

$$\nabla f(x + h) = \nabla f(x) + \langle \nabla^2 f(x), h \rangle + o(\|h\|) \quad \mbox{as}~h \to 0. $$ Upon re-arrangement, $$ \frac{|\langle \nabla^2f(x), h \rangle|}{\|h\|} \leq L \ + o(1), \quad \mbox{as}~h \to 0. $$ By considering $h = tu$, $t \to 0$, and $u$ a fixed unit vector, we immediately see that $\|\nabla^2 f(x)\| \leq L$.

0
On

It is true, and in a sense it is kinda what the Lipschitz property was defined to capture (at least that's what I've been told). I'll try to give some intuition rather than proving what should already be in plenty of books.

Usually, if you want to put a bound on how much a function can change (think of functions in $\mathbb{R}$, but everything still works in $\mathbb{R}^n$), you find the derivative of the function, and find it's maximum norm, since the derivative represents (instantaneous) rate of change. However, some functions are not differentiable, i.e. their instantaneous rate of change is not defined. But, we still want to, and should be able to, put some bound on how much the function can change in some interval. Think of the absolute value function. Sure it is not differentiable at $0$, but it is still "clear" that the rate of change is not greater than $\pm1$. This is exactly what Lipschitz continuity tries to formalize: the (non-instantaneous) rate of change in any interval in the domain is bounded by $L$. In particular, by taking the limit towards an infinitesimal interval, and if that limit exists, we get that the derivative is bounded by $L$.

In your case, since the Hessian is the "derivative" of the gradient, and the gradient is Lipschitz by assumption, then the Hessian will be bounded. This is not surprising, because it is precisely what the Lipschitz assumption was defined for.