Why does Stochastic Gradient Descent work?

741 Views Asked by user816046 At 31 Mar 2026 - 1:44

I am asking this question in the perspective of linear regression.

To make the gradient descent process faster we use SGD. But SGD takes a single sample randomly and computes the gradient of it moves along direction.

What if the direction given by that sample wrong but still how does the SGD manage to get closer to the global minimum.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 13 Aug 2020 - 2:19

Assuming the samples used for SGD are iid, your gradient calculations are unbiased estimators of the expectation of the gradient with respect to your dataset. With a fixed learning rate, your gradient steps will move you in the optimal direction in expectation. One of the benefits of the randomness of SGD is that the noise can sometimes dislodge you from local optima. To reduce the noise though, people often do SGD with "mini-batches", where the gradients are computed against a batch of iid data as opposed to a single sample.

Why does Stochastic Gradient Descent work?

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in STOCHASTIC-PROCESSES

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions