Stochastic gradient descent for a function of multiple variables?

293 Views Asked by At

I realize SGD is used for large data sets where the iterative solution could be approximated by a random sample's gradient instead of the sum over all samples. My question is suppose I have a function of mutiple variables say 'd', i,e just one sample, could one use stochastic gradient descent for just the function? I am asking this because in one of the homework problems in Gilbert Strang's Data science course asks you to compute a single step of gradient descent for a function of two variables, it is explicitly mentioned, full gradient descent not stochastic? I wonder why?

1

There are 1 best solutions below

0
On

I think you're mixing two things here. In Stochastic Gradient Descent (SGD) you utilize all the variables, but not all the data. In Coordinate Descent (CD), you utilize some of the variables, but do use all the data.

To give an example - consider you have a problem with two variables, $x,y$ and 10 data points. Then when using SGD you calculate the gradient with respect to $x,y$, but at each sub-iteration use only part of the data points to calculate the gradient - let's say 5 data points, then in the next sub-iteration you use the other 5 data points. In a "meta-iteration" you have used all the data. With CD, you will use all 10 data points, but at the first sub-iteration take a step only in the (negative) direction of the gradient with respect to $x$, and in the second sub-iteration use the gradient with respect to $y$. A "meta-iteration" have used all the variables.

Notice that this is a substantial difference! It might sound similar but in fact these algorithms are completely different and lead to different results.