How to choose how to shift the inputs, when using steepest descent?

80 Views Asked by Bumbble Comm At 27 Mar 2026 - 2:39

I am following the course of Geoffrey Hinton, "Neural Networks for Machine Learning", and he says that in order to speed up the process of learning when using steepest descent, we can shift the inputs. In his example, he shows that if we have two inputs 101, 101 -> 2 and 101, 99 -> 0, then by subtracting 100 by each of them we get 1, 1 -> 2 and 1, -1 -> 0. That changes the shape of the error surface to be a circle instead of an elongated ellipse. So far so good.

Then he says that "It usually helps to shift each component of the input, so that averaged over all training data, it has a value of zero. That is, make sure its mean is zero." Anyone has a clue why is this? And is this something that he uses in his example?

Below is the slide:

Neural Networks for Machine Learning - Geoffrey Hinton

Original Q&A

There are 1 best solutions below

Bumbble Comm On 18 Jul 2017 - 6:07

If you look at the activation functions, the interesting part happens at zero. For the logistic function, the gradient is also the biggest around zero. The farther you get away from zero, the smaller the gradient gets. Hence the weight updates become smaller. Hence training becomes slower.

How to choose how to shift the inputs, when using steepest descent?

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in NEURAL-NETWORKS

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions