I have a neural network, trained with SGD (stochastic gradient descent) with learning ratio $\alpha$.
Each iteration I try to recalculate the weights with a rule:
$$\Delta \vec{w} = -\alpha \frac{\partial E}{\partial \vec{w}}$$
I know the next parameters: $E$ (softmax or MSE error, averaged by all samples of iteration), $\vec{w}$ (current weights), $\frac{\partial E}{\partial \vec{w}}$ (error gradient, averaged by all samples of iteration).
Very often $\lvert\Delta \vec{w}\rvert \ll E$: the weights change too slow to learn anything.
Sometimes $\lvert\Delta \vec{w}\rvert \gg E$: the weights change a too fast, that also not good.
To avoid it, each time I have to manually select learning ratio for each layer of network. It is a real waste of time.
Cannot you advice me a good method to select $\alpha$'s automatically?
You can use line search or something like that at each step: http://www.cs.cmu.edu/~ggordon/10725-F12/scribes/10725_Lecture5.pdf
Or fancier things like Ada-grad (which is actually easy to implement) http://www.magicbroom.info/Papers/DuchiHaSi10.pdf