A basic question on stochastic gradient descent

174 Views Asked by At

Consider a stochastic gradient iteration:

$$\theta_{k+1} = \theta_{k} - \gamma_k F(\theta_k)$$

where $F$ is a noisy estimate of the gradient $\nabla f$

Now, a book says that it converges in the following sense : $f(\theta_k)$ converges and $\nabla f(\theta_k)$ converges to zero and then it says that it is the strongest possible result for gradient related stochastic approximation.

What is the meaning of it ? Why does not it shows the convergence of the iterates ?

1

There are 1 best solutions below

0
On BEST ANSWER

The reason why it says this for stochastic gradient descent is because even though it will converge it may not converge to that point completely but stay in a small interval around the point.

Take a look at this video on stochastic gradient descent and it should clear things up: https://class.coursera.org/ml-005/lecture/105