I am following along some old lectures notes that I found online (source) and I was hoping someone could provide some insight into a bound regarding gradient descent.
The goal is to solve $\min_{x\in\mathbb{R}^n}f(x)$.
One approach is to use gradient descent to update the variable as (using initial point $x_0$)
\begin{align} x_{k+1} = x_k - \alpha_k\nabla f(x_k),\,\alpha_k>0. \end{align}
For all $k\ge1$, we have (this is what I don't understand, see slide 9) \begin{align} f(x_k) - f^* \le \frac{\|x_0-x^*\|^2 + \sum_{t=0}^{k-1}\alpha_t^2\|\nabla f(x_t)\|^2}{2\sum_{t=0}^{k-1}\alpha_t} \end{align}
Does anyone see where the above bound could have possibly come from?
I'll assume that $f$ is convex (based on the notes). For any $x$ in $\mathbb R^n$ we have, at iteration $k$,
$$\begin{align}\|x_{k+1}-x\|^2 & = \|x_{k} - \alpha_k\nabla f(x_k) - x\|^2 \\ & = \|x_{k} - x\|^2 - 2 \alpha_k\nabla f(x_k)^T (x_k - x) + \alpha_k^2 \|\nabla f(x_k)\|^2. \end{align}$$
On the other hand, rearranging the gradient's inequality for convex functions at $x_k$, we have $$-\nabla f(x_k)^T (x_k - x) \leq f(x)-f(x_k),$$ (cf. Boyd's book, p. 69, eq. 3.2) so we can bound the last equality: $$\begin{align}\|x_{k+1}-x\|^2 & \leq \|x_{k} - x\|^2 + 2 \alpha_k (f(x) - f(x_k)) + \alpha_k^2 \|\nabla f(x_k)\|^2. \end{align}$$ Rearranging once more, we get $$2 \alpha_k (f(x_k) - f(x)) \leq \|x_{k} - x\|^2 - \|x_{k+1}-x\|^2 + \alpha_k^2 \|\nabla f(x_k)\|^2. $$ Now sum over $t$: $$2\sum_{t=0}^k \alpha_t (f(x_t) - f(x)) \leq \|x_{0} - x\|^2 - \|x_{k+1}-x\|^2 + \sum_{t=0}^k\alpha_t^2 \|\nabla f(x_t)\|^2.$$ To get the desired bound, notice that the middle term can be dropped from the RHS and that $f(x_k)$ is the smallest of the $f(x_t)$, so you can replace it in the LHS. Finally, divide by $2\sum _{t=0}^k \alpha_t$ to obtain
\begin{align} f(x_k) - f(x) \le \frac{\|x_0-x\|^2 + \sum_{t=0}^{k}\alpha_t^2\|\nabla f(x_t)\|^2}{2\sum_{t=0}^{k}\alpha_t}. \end{align}