Why is the Nesterov's accelerated gradient different for convex and strongly convex functions?

530 Views Asked by At

In 1405.4980, Sébastien Bubeck offered two different expressions of Nesterov's accelerated gradient (NAG) algorithm. For convex functions, he writes the following:


enter image description here


For strongly convex functions, he writes the following:


enter image description here


What would be the justification for two different forms of expressions?

This distinction doesn't seem to be enforced in many other references.

1

There are 1 best solutions below

0
On BEST ANSWER

Oh, this is definitely present in other places, including Nesterov's own texts on the subject.

I'm pretty sure there's a typo in the first one; $\gamma_s$ should be $\gamma_t$. And with that typo fixed, what you see is that in the strongly convex case, the update rule for $\gamma_t$ is discarded in favor of a fixed value $\gamma_t \equiv \frac{1-\sqrt{\kappa}}{1+\sqrt{\kappa}}$.

You can still use the first $\gamma_t$ update rule for strongly convex functions if you want to---but the performance won't be as good. Assuming strong convexity lets you tighten things up and achieve better performance.

Note that $\kappa=\beta/\alpha$, where $\beta$ is the Lipschitz continuity constant and $\alpha$ is the strong convexity constant. So what this means is that the "stronger" the convexity, the higher the value of $\alpha$, and the closer $\gamma_t$ is to 1. This in turn means you're using less acceleration: the "optimal" first order method gets closer to standard gradient descent the more strongly convex the function is.