I am interesting in learning about accelerated stochastic gradient descent.
However, unlike that of the non-stochastic case, I have found a massive amount and seemingly different examples of so-called accelerated stochastic gradient descent.
I also couldn't find out why they are "accelerated" because all of them offers different types of bounds, some in terms of regret and other in terms of iteration complexity, other seems to focus on this concept of variance reduction. .
For example, here is a list of "accelerated" stochastic gradient descent I found:
http://www.cs.tau.ac.il/~wolf/deeplearningmeeting/pdfs/FastStochastic.pdf
https://arxiv.org/pdf/1704.08227.pdf
https://arxiv.org/pdf/1412.6980.pdf
https://arxiv.org/pdf/1704.07953.pdf
https://arxiv.org/pdf/1506.03016.pdf
I am a bit lost in this vast array of literature. Can someone who is experienced in this subject provide a suggestion as to which algorithm should be considered to be "accelerated stochastic gradient descent"?
As mentioned in my comment there are multiple variations of stochastic gradient descent that could potentially fit your description (all of which are "well-studied with provable guarantee"). "Accelerated Stochastic Gradient Descent" is not a term I have heard in any of the literature I've come across.
An example of such an algorithm that could fit what you're looking for occurs when you add a momentum term to the standard SGD algorithm. Your weight update rule would look like:
$$ w := w - \eta\nabla f + \alpha \Delta w$$
where $f$ is the function we're trying to minimize and $\Delta w$ is the change in weight from the previous update.
The addition of this momentum term aims to correct slow convergence toward a local minima in on direction. Basically if the gradient is not changing signs, then this momentum term will continue to get larger and larger allowing the gradient descent to "cover more ground" so to speak.