Why is the Nesterov's accelerated gradient different for convex and strongly convex functions?

533 Views Asked by Bumbble Comm At 10 May 2026 - 4:56

In 1405.4980, Sébastien Bubeck offered two different expressions of Nesterov's accelerated gradient (NAG) algorithm. For convex functions, he writes the following:

For strongly convex functions, he writes the following:

What would be the justification for two different forms of expressions?

This distinction doesn't seem to be enforced in many other references.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 22 Apr 2018 - 4:40 BEST ANSWER

Oh, this is definitely present in other places, including Nesterov's own texts on the subject.

I'm pretty sure there's a typo in the first one; $\gamma_s$ should be $\gamma_t$. And with that typo fixed, what you see is that in the strongly convex case, the update rule for $\gamma_t$ is discarded in favor of a fixed value $\gamma_t \equiv \frac{1-\sqrt{\kappa}}{1+\sqrt{\kappa}}$.

You can still use the first $\gamma_t$ update rule for strongly convex functions if you want to---but the performance won't be as good. Assuming strong convexity lets you tighten things up and achieve better performance.

Note that $\kappa=\beta/\alpha$, where $\beta$ is the Lipschitz continuity constant and $\alpha$ is the strong convexity constant. So what this means is that the "stronger" the convexity, the higher the value of $\alpha$, and the closer $\gamma_t$ is to 1. This in turn means you're using less acceleration: the "optimal" first order method gets closer to standard gradient descent the more strongly convex the function is.

Why is the Nesterov's accelerated gradient different for convex and strongly convex functions?

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-ANALYSIS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NUMERICAL-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions