Gradient descent with adaptive learning ratio.

404 Views Asked by Bumbble Comm At 17 May 2026 - 11:11

I have a neural network, trained with SGD (stochastic gradient descent) with learning ratio $\alpha$.

Each iteration I try to recalculate the weights with a rule:

$$\Delta \vec{w} = -\alpha \frac{\partial E}{\partial \vec{w}}$$

I know the next parameters: $E$ (softmax or MSE error, averaged by all samples of iteration), $\vec{w}$ (current weights), $\frac{\partial E}{\partial \vec{w}}$ (error gradient, averaged by all samples of iteration).

Very often $\lvert\Delta \vec{w}\rvert \ll E$: the weights change too slow to learn anything.

Sometimes $\lvert\Delta \vec{w}\rvert \gg E$: the weights change a too fast, that also not good.

To avoid it, each time I have to manually select learning ratio for each layer of network. It is a real waste of time.

Cannot you advice me a good method to select $\alpha$'s automatically?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 04 Jun 2014 - 6:43 BEST ANSWER

You can use line search or something like that at each step: http://www.cs.cmu.edu/~ggordon/10725-F12/scribes/10725_Lecture5.pdf

Or fancier things like Ada-grad (which is actually easy to implement) http://www.magicbroom.info/Papers/DuchiHaSi10.pdf

Gradient descent with adaptive learning ratio.

There are 1 best solutions below

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions