I am trying to solve exercise 3.5 in Optimization for Data Analysis but can't seem to get it quite right.
The exercise is as follows: Suppose that $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a strongly convex function with modulus m, an L-Lipschitz gradient, and (unique) minimizer $x^*$ with function value $f^* = f (x^*)$. Use the co-coercivity property and the fact that $\nabla f (x^*) = 0$ to prove that the $k$th iterate of the steepest-descent method applied to f with step-length $\frac{2}{L+m}$ satisfies $$\|x_k - x^* \|\leq (\frac{L-m}{L+m})^k \|x_0 - x^*\|$$
The co-coercity property is as follows: $$(\nabla f(x) - \nabla f(y))^T(x-y) \geq \frac{1}{L}\|\nabla f(x) - \nabla f(y) \|^2$$
I have tried starting from the inequality $f(x + \alpha d) \leq f(x) + \alpha \nabla f(x)^T d + \alpha^2\frac{L}{2} \|d\|^2$ and plugging in the step size $\alpha = \frac{2}{L+m}$ and the direction $d = -\nabla f(x)$. I then want to apply the co-coercity property and probably the fact that $f$ is m-strongly convex but I am not sure how. I also can't seem to get rid of the recursion when setting the two points to $x^{k+1}$ and $x^k$.
Any suggestions or tips would be greatly appreciated. Thanks!
Since your function is $m$-strongly convex, then the function $$ g(x) = f(x) - \frac{m}{2} \|x\|^2 $$ is convex and $(L - m)$ smooth (I believe this is an exercise in the same chapter of the book you are reading). Therefore, you can apply co-coercivity to the function $g$ with modulus $(L - m)$: $$ \langle \nabla g(y) - \nabla g(x), y - x \rangle \geq \frac{1}{L - m} \|\nabla g(y) - \nabla g(x)\|^2. $$
If you expand this you should arrive at the following inequality:
The above inequality is key to proving the descent property you want.