Gradient descent with adaptive learning ratio.

401 Views Asked by At

I have a neural network, trained with SGD (stochastic gradient descent) with learning ratio $\alpha$.

Each iteration I try to recalculate the weights with a rule:

$$\Delta \vec{w} = -\alpha \frac{\partial E}{\partial \vec{w}}$$

I know the next parameters: $E$ (softmax or MSE error, averaged by all samples of iteration), $\vec{w}$ (current weights), $\frac{\partial E}{\partial \vec{w}}$ (error gradient, averaged by all samples of iteration).

Very often $\lvert\Delta \vec{w}\rvert \ll E$: the weights change too slow to learn anything.

Sometimes $\lvert\Delta \vec{w}\rvert \gg E$: the weights change a too fast, that also not good.

To avoid it, each time I have to manually select learning ratio for each layer of network. It is a real waste of time.

Cannot you advice me a good method to select $\alpha$'s automatically?

1

There are 1 best solutions below

0
On BEST ANSWER

You can use line search or something like that at each step: http://www.cs.cmu.edu/~ggordon/10725-F12/scribes/10725_Lecture5.pdf

Or fancier things like Ada-grad (which is actually easy to implement) http://www.magicbroom.info/Papers/DuchiHaSi10.pdf