how is local minima possible in gradient descent?

158 Views Asked by At

gradient descent works on the equation of mean squared error, which is an equation of a parabola y=x^2

we often say weight adjustment in a neural network by gradient descent algorithm can hit a local minima and get stuck in there.

My question is, how is local minima possible on the equation of a parabola, where the slope is always parabolic !

1

There are 1 best solutions below

0
On

The behavior is parabolic close to a minimum, but there can be as many minima as you want !

Think of a total least-squares line fitting problem where there are just four points forming a square. By symmetry, there must be several solutions (diagonals or medians), and there will be several local minima.