stochastic subgradient descent

276 Views Asked by nav At 31 Mar 2026 - 11:49

When using stochastic subgradient descent, the solution $f_{best}(x_k)= \min \{f(x_1),f(x_2),....f(x_k) \}$, i.e., the best "point" over all the steps. As I understand, I should evaluate the function using the whole dataset for this method. However, the stochastic method is used because using the whole dataset to evaluate the derivative is very slow.

My Question is: What is the advantage using stochastic subgradient descent and not subgradient descent, if I have to use the whole dataset anyway?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 21 Sep 2017 - 5:48

The reason to use stochastic subgradient descent (and more generally stochastic gradient descent), even though you may need to visit the entire data set is related to computational complexity. In full subgradient descent, you need to access the entire data set for a single gradient step, whereas in stochastic descent you merely update the target function by a small subset of the data. If your data set is extremely large (as is increasingly the case in this field), then computing over all the data at a single time has space complexity that is simply too high for realistic hardware. Moreover, there are a number of tricks for parallelizing stochastic gradient descent that have no simple equivalent in full descent (for instance searching along different axes simultaneously and in parallel, then merging the partial results).

stochastic subgradient descent

There are 1 best solutions below

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions