Why do we use average iterates when implementing SGD?

93 Views Asked by Bumbble Comm At 03 Apr 2026 - 1:04

I read many papers about stochastic gradient descent (SGD). One thing I am curious is that many non-asymptotic convergence results are given with respect to the error between optimal solution and the average point of iterates generated by SGD. For example: $\mathbb{E}(f(\bar x^k)-f(x^*)) $ or $\mathbb{E}\left \| \bar x^k-x^* \right \| ^2$. My question is: why do we want to use the average of iterates? Why don't we just analyze the final iterate of SGD? Thank you!

Original Q&A

Why do we use average iterates when implementing SGD?

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions