Gradient descent proof: show that $f(x^k) - f(x^*) \leq \|x^0 - x^*\|_2^2/2\alpha k$

353 Views Asked by At

I am going through the proof of the convergence rate of gradient descent, following this link.

However, I am completely stuck on how the author went from equation (18) to equation (19) (boxed in red).

Can someone please provide some assistance to the derivation of equation (19).

The proof is attached below:

enter image description here

Per request, the content of Corollary 2.6 is shown below (without proof, but is a well-known bound for GD)

enter image description here

2

There are 2 best solutions below

0
On BEST ANSWER

Note that (19), by the binomial formula, is equal to \begin{eqnarray} \frac{1}{2\alpha}\left(||x^{(i-1)} - x^*||^2_2 - ||x^{(i-1)} - x^* - \alpha \nabla f(x^{(i-1)}||^2_2) \right)= \frac{1}{2\alpha}\left(||x^{(i-1)} - x^*||^2_2 - ||x^{(i-1)} - x^*||^2_2 +2\alpha ||\nabla f(x^{(i-1)})||_2||x^{(i-1)} - x^*||_2 - \alpha^2 ||\nabla f(x^{(i-1)})||^2\right)= ||\nabla f(x^{(i-1)})||_2||x^{(i-1)} - x^*||_2 -\frac{\alpha}{2}||\nabla f(x^{(i-1)})||_2^2 \end{eqnarray} This is $\ge (18)$, by Cauchy-Schwartz (but in general not $=$).

2
On

I would post this as a comment, but I don't have enough flair. Here are some suggestions. It's something similar to ''completing the squares'' argument for scalars. Try working backwards to simplify (19), you can get to equation (18) by noting (a) $\|x\|^2 = x^T x$ and (b) $x^Ty = y^Tx$ (for real vectors at least).

Or, said another way, if you really want to go from (18) to (19), you could, add and subtract $\|x^{(i-1)} - x^*\|^2$ and proceed from there by appropriately normalizing using $\alpha$.