On a version of gradient descent

213 Views Asked by At

I am trying to read this paper and have gotten stuck. The author considers the problem of minimizing a convex function whose gradient has coordinate-wise Lipschitz constant $M$ (meaning that for all vectors $x$ and basis vectors $e_i$, we have $||f'(x+h e_i)-f'(x)||_2 \leq M|h|$) and considers the scheme $$ x(t+1) = x(t) - \frac{1}{M} [f'(x(t)]_k e_k$$ where $[\cdot]_i$ denotes the $i$'th entry of a vector and $k$ is the index which maximizes $|[f'(x(t)]_k|$. In words, we update by moving in the opposite direction of the largest component of the gradient.

The scheme is introduced on page 1 (see big box in middle of page) and just two lines later (in the sequence of inequalities after "Then,") the author appears to be using that $$ ||f'(x(t))||_2^2 \geq \frac{\left( f(x(t) - f(x^*) \right)^2}{||x(0) - x^*||_2^2}$$ where $f^*$ is the global minimum and $x^*$ is presumably the minimizer (I'm assuming - I see no definition for $x^*$).

Anyway, I don't see why this statement is true. I do see how to prove it if $x(0)$ were replaced by $x(t)$ on the right-hand side, so if it could be shown that $||x(t)-x^*||$ is nonincreasing, that would do it. But I don't see how to show that; it would seem to require showing that $[f'(x(t))]_k$ has the same sign as $[x(t)-x^*]_k$, and I don't see how to argue that.