How to use co-coercity property to prove $\|x_k - x^* \|\leq (\frac{L-m}{L+m})^k \|x_0 - x^*\|$ for steepest descent with $\alpha = \frac{2}{L+m}$?

101 Views Asked by At

I am trying to solve exercise 3.5 in Optimization for Data Analysis but can't seem to get it quite right.

The exercise is as follows: Suppose that $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a strongly convex function with modulus m, an L-Lipschitz gradient, and (unique) minimizer $x^*$ with function value $f^* = f (x^*)$. Use the co-coercivity property and the fact that $\nabla f (x^*) = 0$ to prove that the $k$th iterate of the steepest-descent method applied to f with step-length $\frac{2}{L+m}$ satisfies $$\|x_k - x^* \|\leq (\frac{L-m}{L+m})^k \|x_0 - x^*\|$$

The co-coercity property is as follows: $$(\nabla f(x) - \nabla f(y))^T(x-y) \geq \frac{1}{L}\|\nabla f(x) - \nabla f(y) \|^2$$

I have tried starting from the inequality $f(x + \alpha d) \leq f(x) + \alpha \nabla f(x)^T d + \alpha^2\frac{L}{2} \|d\|^2$ and plugging in the step size $\alpha = \frac{2}{L+m}$ and the direction $d = -\nabla f(x)$. I then want to apply the co-coercity property and probably the fact that $f$ is m-strongly convex but I am not sure how. I also can't seem to get rid of the recursion when setting the two points to $x^{k+1}$ and $x^k$.

Any suggestions or tips would be greatly appreciated. Thanks!

1

There are 1 best solutions below

0
On

Since your function is $m$-strongly convex, then the function $$ g(x) = f(x) - \frac{m}{2} \|x\|^2 $$ is convex and $(L - m)$ smooth (I believe this is an exercise in the same chapter of the book you are reading). Therefore, you can apply co-coercivity to the function $g$ with modulus $(L - m)$: $$ \langle \nabla g(y) - \nabla g(x), y - x \rangle \geq \frac{1}{L - m} \|\nabla g(y) - \nabla g(x)\|^2. $$

If you expand this you should arrive at the following inequality:

$$ \langle \nabla f(y) - \nabla f(x), y - x \rangle \geq \frac{mL}{m + L} \|x - y\|^2 + \frac{1}{m + L} \|\nabla f(x) - \nabla f(y)\|^2. $$

The above inequality is key to proving the descent property you want.