Different definition for L smooth function.

2.7k Views Asked by At

At optimization class, professor gave the definition of L smooth function by

$f:\mathbf{R^n} \rightarrow \mathbf{R} $ is L smooth if all the eigenvalue for $\nabla^{2} f $ is smaller than L

where $\nabla^{2} f = \begin{pmatrix} \frac{\partial ^2 f}{\partial x_1 \partial x_1} & \frac{\partial ^2 f}{\partial x_1 \partial x_2 } & \cdots & \frac{\partial ^2 f}{\partial x_1 \partial x_n } \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial ^2 f}{\partial x_n \partial x_1} & \frac{\partial ^2 f}{\partial x_n \partial x_2 } & \cdots & \frac{\partial ^2 f}{\partial x_n \partial x_n } \end{pmatrix}$

but when I search for other books, it state that

$f$ is L smooth if $ \frac{\lVert \nabla f(x)- \nabla f(y)\rVert}{\lVert x-y \rVert} \leq L,$ $\forall x,y \in \mathbf{R^n}$

Are these two statement equivalent ?

What I have done is $ \frac{\lVert \nabla f(x)- \nabla f(y)\rVert}{\lVert x-y \rVert} \leq L,$ $\forall x,y \in \mathbf{R^n}$ $\implies$ $\lvert \frac{\partial^2 f}{\partial x_i \partial x_j} \rvert \leq L, \forall i,j$, but I don't know how it relate to the eigenvalue of $\nabla^2f$.

Later the professor had this

$f(x^t)-f(x^*)=\frac{1}{2}(x^t-x^*)^T\nabla^2f(\bar{x})^T(x^t-x^*)\leq \frac{L}{2}\lVert x^t-x^* \rVert^2$

where $x\in \mathbf{R^n},$ $x^t$ is t iteration for x, $x^*$ is the final form of $x$ s.t. $f(x^*)$ smallest, $\bar{x}$ is some point between $x^t$ and $x^*$.

This part $f(x^t)-f(x^*)=\frac{1}{2}(x^t-x^*)^T\nabla^2f(\bar{x})^T(x^t-x^*)$ is by Tyler expansion which I can understand

The second inequation $\frac{1}{2}(x^t-x^*)^T\nabla^2f(\bar{x})^T(x^t-x^*)\leq \frac{L}{2}\lVert x^t-x^* \rVert^2$ is where I don't get it. Is it by the definition professor stated for L smooth ?

1

There are 1 best solutions below

4
On

When the function $f$ is convex and twice differentiable, the following are equivalent:

  1. $\frac{L}{2}\|\cdot\|^2-f(\cdot)\quad$ is a convex function

  2. $\|\nabla f(x) - \nabla f(y)\|\leq L\|x-y\|$

  3. $0\leq \nabla^2 f(x)\leq L \mathrm{Id}\quad$ (where inequalities are understood in the Loewner ordering and $\mathrm{Id}$ denoted the identity matrix)

The last inequality you're interested in follows directly from 3., the inequality means for any $d$,

$$0\leq \langle d, \nabla^2f(x) d\rangle \leq \frac{L}{2}\|d\|^2$$