How can Newton's method ever work in neural netowrks?

147 Views Asked by Bumbble Comm At 10 May 2026 - 6:03

In neural networks, we try to find successively better weights for the network by trying to minimize some error function. Gradient descent can be used. The error function may not be convex with respect to the weights of the network.

Recently I came across the Newton's method for finding a root of a function used for minimizing the error function by means of approximating the function with quadratic function and minimizing that in each step.

Let $f$ be the error function to minimize. The Newton's method goes like this. Approximate $f$ with second order Taylor polynomial.

$f(a_1+x) = f(a_1) + f'(a_1)(x-a_1)+\frac{f''(a_1)}{2}(x-a_1)^2$

According to the method, the update is as follows.

$x=-\frac{f'(a_1)}{f''(a_1)}$

and so

$a_2 = a_1 -\frac{f'(a_1)}{f''(a_1)}$

If $f$ is concave around $a$, then the update can be in the direction of local maximum instead of local minimum.

My question is, how can this method ever work, if we find ourselves in a concave part of the error function? Then the Newtons method will direct us to the local maximum, which is definitely undesirable.

How is this case handled in neural networks? Am I missing something?

Original Q&A

How can Newton's method ever work in neural netowrks?

Related Questions in TAYLOR-EXPANSION

Related Questions in NEWTON-RAPHSON

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions