I'm reading a lecture note about convergence of Gradient Descent. My problem lies in Theorem 6.1
and its proof
Could you please explain how the author gets the last inequality $$f\left(x^{(k)}\right)-f\left(x^{*}\right) \leq \frac{1}{k} \sum_{i=1}^{k} \left (f\left(x^{(i)}\right)-f\left(x^{*}\right) \right)$$
It seems to me that the correct inequality is actually the opposite one, i.e., $$f(x^{(k)})-f(x^{*}) \ge\frac{1}{k} \sum_{i=1}^{k} \left (f(x^{(i)})-f(x^{*}) \right)$$
My reasoning is as follows: Because $f$ decreasing on every iteration, $$f(x^{(k)})-f(x^{*}) \ge f(x^{(i)})-f(x^{*}) \ge$$ for all $i \le k$. It follows that $$\sum_{i=1}^{k} (f(x^{(k)})-f(x^{*})) \ge \sum_{i=1}^{k} (f(x^{(i)})-f(x^{*}))$$ Thus $$k(f(x^{(k)})-f(x^{*})) \ge \sum_{i=1}^{k} (f(x^{(i)})-f(x^{*}))$$
Thank you so much for your help!





I just formulate what @PhoemueX's explanation in the comment to answer this question.
It follows from $f$ decreases after each step that $$f(x^\star) \le f(x_{k+1}) \le f(x_k) \le \cdots \le f(x_0)$$
Then $$f(x_{k+1})-f(x^\star) \le f(x_n)-f(x^\star)$$ for all $n \le k$. Consequently, $$f(x_{k+1})-f(x^\star) \le \frac{1}{k+1} \sum_{n=0}^k (f(x_{n+1})-f(x^\star)) \le \frac{L\|x_0-x^\star\|^2}{2(k+1)}$$ Finally, we get $$f(x_{k+1})-f(x^\star) \le \frac{L\|x_0-x^\star\|^2}{2(k+1)}$$