easy explanation of Newton's method

141 Views Asked by At

I was reading a tutorial of training a neural network using Newton's method and it says, "The maximum error reduction (of the error surface function) depends on the ratio of the gradient to the curvature. So, a good direction to move in is one with a high ratio of gradient to curvature, even if the gradient itself is small"

Can anybody give an intuitive explanation as to why the high ratio of gradient to curvature is a good direction to move?

2

There are 2 best solutions below

1
On BEST ANSWER

I'm assuming by 'curvature' they mean '2nd derivative'. The reason for the formula is pretty simple to derive. Suppose that $y$ is the actual minimum of $f$ and $x$ is the current point. Then the 2nd order Taylor approximation of $f$ about $x$ gives

$$f(y) \approx f(x) + f'(x)(y-x) + \frac{1}{2}f''(x)(y-x)^2$$

If you just differentiate with respect to $y$ and use that $f'(y)=0$ then you get

$$y\approx x-\frac{f'(x)}{f''(x)}.$$

So it's the maximum decrease in the error according to a local approximation of the error by a quadratic function.

0
On

Newton's method only uses the first derivative of the operator to approximate its behavior and estimate how to adjust the inputs to reach the desired outputs. If the operator is actually close to linear, this estimate will work well. If the operator is highly nonlinear, i.e. it has a lot of curvature or large higher derivatives, then that curvature will have a large effect that the estimate doesn't take into account and the estimate will be bad.