How to get this inequality in Gradient Descent algorithm?

253 Views Asked by At

I'm reading Robert M. Gower - Convergence Theorems for Gradient Descent about Gradient Descent algorithm. I would like to ask how the author applies inequality (14) $$\langle\nabla f(y)-\nabla f(x), y-x\rangle \geq \frac{1}{L}\|\nabla f(x)-\nabla f(y)\|$$ to get $$-2 \frac{1}{L}\left\langle x^{t}-x^{*}, \nabla f\left(x^{t}\right)\right\rangle+\frac{1}{L^{2}}\left\|\nabla f\left(x^{t}\right)\right\|_{2}^{2} \le -\frac{1}{L^{2}}\left\|\nabla f\left(x^{t}\right)\right\|_{2}^{2}$$

I don't follow how to get rid of $\nabla f(x^\star)$. Thank you so much for your help!

enter image description here Here $$x^{*}=\arg \min _{x \in \mathbb{R}^{d}} f(x)$$ enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

Fist of all I suspect there's a typo in (14), as I find a very similar lemma on page 15 (PDF page 17) of this stating the norm on the RHS should be squared. $$ \langle y - x, \nabla f(y) - \nabla f(x) \geq \frac{1}{L} \lVert \nabla f(y) - \nabla f(x) \rVert_2^2 $$

Now: If $x^*$ is to be a stationary point of the gradient method in the context defined above then the gradient must vanish at that point, $\nabla f(x^*) = \mathbf{0}$. This means that the innerproduct $$ \langle x^t - x^*, \nabla f(x^t) \rangle = \langle x^t - x^*, \nabla f(x^t) - \nabla f(x^*) \rangle. $$ Using (14) on this innerproduct yields the inequality $$ \langle x^t - x^*, \nabla f(x^t) - \nabla f(x^*) \rangle \geq \frac{1}{L} \lVert \nabla f(x^t) - \nabla f(x^*) \rVert_2^2. $$ By multiplying both sides with $\frac{-2}{L}$ (and flipping the inequality) and inserting for $\nabla f(x^*) = \mathbf{0}$ we arrive at the inequality $$ -\frac{2}{L} \langle x^t - x^*, \nabla f(x^*) \rangle \leq -\frac{2}{L^2} \lVert \nabla f(x^t) \rVert_2^2 $$ which means that $$ \lVert x^t - x^* \rVert_2^2 - \frac{2}{L} \langle x^t - x^*, \nabla f(x^*) \rangle + \frac{1}{L^2} \lVert \nabla f(x^t) \rVert_2^2 \leq \lVert x^t - x^* \rVert_2^2 - \frac{2}{L^2} \lVert \nabla f(x^t) \rVert^2_2 + \frac{1}{L^2} \lVert \nabla f(x^t) \rVert_2^2 \leq \lVert x^t - x^* \rVert_2^2 + \frac{1}{L^2} \lVert \nabla f(x^t) \rVert_2^2 $$ If there isn't a typo in the original (14) you end up with $$ - \frac{2}{L^2} \lVert \nabla f(x^t) \rVert + \frac{1}{L^2} \lVert \nabla f(x^t) \rVert_2^2 \leq \frac{1}{L^2} \lVert \nabla f(x^t) \rVert_2^2$$ which means the proof still holds but I strongly suspect that there is a typo.