Gradient Descent convergence proof - understanding one step with Lipschitz function

532 Views Asked by At

I'm looking at a proof of Gradient descent convergenceLecture notes, but I can't follow where how Lemma 4.1 comes from the Lipschitz assumption.

A function f is L-Lipschitz if and only if: $|| \nabla f(x) - \nabla f(y)||_2 \le L||x-y|| $

Lemma 4.1: If $f$ is L-Lipschitz, then

$$|f(y)-f(x) - \langle\nabla f(x),y-x \rangle| \le \frac{L}{2}||y-x||^2$$

Can anyone explain where the Lemma comes from? Or point me to a reference that shows this?

Thanks

2

There are 2 best solutions below

0
On

$f(y) = f(x) + {\partial f(x) \over \partial x} (y-x) + \int_0^1 ( {\partial f(x+t(y-x)) \over \partial x} - { \partial f(x) \over \partial x} ) (y-x) dt$.

Hence $f(y) - f(x) - {\partial f(x) \over \partial x} (y-x) = \int_0^1 ( {\partial f(x+t(y-x)) \over \partial x} - { \partial f(x) \over \partial x} ) (y-x) dt$.

Note that $|\int_0^1 ( {\partial f(x+t(y-x)) \over \partial x} - { \partial f(x) \over \partial x} ) (y-x) dt | \le t\int_0^1 L \|x-y\|^2 dt = {1 \over 2} L\|x-y\|^2$.

0
On

This can be shown in two parts.

Part-1: Convert the lemma into another equivalent condition.

Firstly, we can rearrange the lemma as follows:

$$ f(y) \leq f(x) + \nabla f(x)^T(y-x) + \frac{L}{2}||y-x||^2 $$

In the context of convex optimisation, this means that the function $f$ can be upper bounded by a quadratic provided its gradient is Lipschitz continuous.

Now, since the lemma must hold for all $x, y \in \textbf{dom}(f)$, we can rewrite the inequality with $x$ and $y$ interchanged.

$$ f(x) \leq f(y) + \nabla f(y)^T(x-y) + \frac{L}{2}||x-y||^2 $$

Adding the two inequalities, we have

$$ (\nabla f(x)-\nabla f(y))^T(x-y) \leq L||x-y||^2 $$

Part-2: Prove that the above inequality holds when $\nabla f$ is Lipschitz continuous.

By Cauchy-Schwarz inequality we have,

$$ (\nabla f(x) - \nabla f(y))^T(x-y) \leq ||\nabla f(x) - \nabla f(y)||\text{ }||x-y|| $$

Using Lipschitz continuity of $\nabla f$,

$$ (\nabla f(x) - \nabla f(y))^T(x-y) \leq L ||x-y||\text{ }||x-y|| $$

Thus,

$$ (\nabla f(x) - \nabla f(y))^T(x-y) \leq L ||x-y||^2 $$