How to choose how to shift the inputs, when using steepest descent?

80 Views Asked by At

I am following the course of Geoffrey Hinton, "Neural Networks for Machine Learning", and he says that in order to speed up the process of learning when using steepest descent, we can shift the inputs. In his example, he shows that if we have two inputs 101, 101 -> 2 and 101, 99 -> 0, then by subtracting 100 by each of them we get 1, 1 -> 2 and 1, -1 -> 0. That changes the shape of the error surface to be a circle instead of an elongated ellipse. So far so good.

Then he says that "It usually helps to shift each component of the input, so that averaged over all training data, it has a value of zero. That is, make sure its mean is zero." Anyone has a clue why is this? And is this something that he uses in his example?

Below is the slide:

Neural Networks for Machine Learning - Geoffrey Hinton

1

There are 1 best solutions below

0
On

If you look at the activation functions, the interesting part happens at zero. For the logistic function, the gradient is also the biggest around zero. The farther you get away from zero, the smaller the gradient gets. Hence the weight updates become smaller. Hence training becomes slower.