I am going through the proof of the convergence rate of gradient descent, following this link.
However, I am completely stuck on how the author went from equation (18) to equation (19) (boxed in red).
Can someone please provide some assistance to the derivation of equation (19).
The proof is attached below:
Per request, the content of Corollary 2.6 is shown below (without proof, but is a well-known bound for GD)


Note that (19), by the binomial formula, is equal to \begin{eqnarray} \frac{1}{2\alpha}\left(||x^{(i-1)} - x^*||^2_2 - ||x^{(i-1)} - x^* - \alpha \nabla f(x^{(i-1)}||^2_2) \right)= \frac{1}{2\alpha}\left(||x^{(i-1)} - x^*||^2_2 - ||x^{(i-1)} - x^*||^2_2 +2\alpha ||\nabla f(x^{(i-1)})||_2||x^{(i-1)} - x^*||_2 - \alpha^2 ||\nabla f(x^{(i-1)})||^2\right)= ||\nabla f(x^{(i-1)})||_2||x^{(i-1)} - x^*||_2 -\frac{\alpha}{2}||\nabla f(x^{(i-1)})||_2^2 \end{eqnarray} This is $\ge (18)$, by Cauchy-Schwartz (but in general not $=$).