Which algorithm considered to be the accelerated version of stochastic gradient descent?

75 Views Asked by Bumbble Comm At 10 May 2026 - 2:52

I am interesting in learning about accelerated stochastic gradient descent.

However, unlike that of the non-stochastic case, I have found a massive amount and seemingly different examples of so-called accelerated stochastic gradient descent.

I also couldn't find out why they are "accelerated" because all of them offers different types of bounds, some in terms of regret and other in terms of iteration complexity, other seems to focus on this concept of variance reduction. .

For example, here is a list of "accelerated" stochastic gradient descent I found:

http://www.cs.tau.ac.il/~wolf/deeplearningmeeting/pdfs/FastStochastic.pdf

https://arxiv.org/pdf/1704.08227.pdf

https://arxiv.org/pdf/1412.6980.pdf

https://arxiv.org/pdf/1704.07953.pdf

https://arxiv.org/pdf/1506.03016.pdf

https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf

http://delivery.acm.org/10.1145/3060000/3055448/stoc17-main216.pdf?ip=138.51.122.136&id=3055448&acc=ACTIVE%20SERVICE&key=FD0067F557510FFB%2E148C9AE997532579%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&acm=1518901037_303c22e2606841f073e7e28d786bccb6

I am a bit lost in this vast array of literature. Can someone who is experienced in this subject provide a suggestion as to which algorithm should be considered to be "accelerated stochastic gradient descent"?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 18 Feb 2018 - 12:20 BEST ANSWER

As mentioned in my comment there are multiple variations of stochastic gradient descent that could potentially fit your description (all of which are "well-studied with provable guarantee"). "Accelerated Stochastic Gradient Descent" is not a term I have heard in any of the literature I've come across.

An example of such an algorithm that could fit what you're looking for occurs when you add a momentum term to the standard SGD algorithm. Your weight update rule would look like:

$$ w := w - \eta\nabla f + \alpha \Delta w$$

where $f$ is the function we're trying to minimize and $\Delta w$ is the change in weight from the previous update.

The addition of this momentum term aims to correct slow convergence toward a local minima in on direction. Basically if the gradient is not changing signs, then this momentum term will continue to get larger and larger allowing the gradient descent to "cover more ground" so to speak.

Which algorithm considered to be the accelerated version of stochastic gradient descent?

There are 1 best solutions below

Related Questions in REFERENCE-REQUEST

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NONLINEAR-OPTIMIZATION

Related Questions in NUMERICAL-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions