What does the derivative have to do with slow convergence in Newton's Method?

Question

What does the derivative have to do with slow convergence in Newton's Method?

2.1k Views Asked by Bumbble Comm At 27 Mar 2026 - 11:47

I am looking for an intuition for the following behavior:

Let $f(x)=x^2$

Apparently the Newton's Method iterations to find the root (at $x=0$) converge in this case relatively slow: $x_{x+1}=x_n-\frac{f(x)}{f'(x)}$. I was told this happens because of the fact that the derivative at the root is also zero: $f'(0)=0$

However, I don't understand why the derivative would matter:

$$x_{n+1}=x_n-\frac{x_n^2}{2x_n}=\frac{x_n}{2}$$ I would say that because the root is $0$ therefore $x_n$ becomes smaller and smaller and therefore the steps towards the 'right' answer become smaller and smaller.

a) Is this reasoning correct?

b) What does all this have to do with the derivative?

I created some graphics:

First $f(x)=x^2$ with $x_0=2$

And $f(x)=x^2-1$ with $x_0=3$

Indeed, with the double root the convergence goes slower. Still I do not really have an intuition for this behavior. I can see the calculations confirm this result, but why is this also intuitively true?

Original Q&A

There are 3 best solutions below

**Bumbble Comm** · Answer 1 · 2016-07-11 17:14:25

Try running your argument for the function

$$ f(x) = (x-1)^2 $$

You will see rapidly why it is the derivative, and not the value of the root, that matters.

Okay: firstly, please note that in Newton's method, unless you accidentally hit the root in finitely many steps, always the steps get smaller and smaller. So that the qualitative fact that the steps get smaller do not say anything about rates of convergence.

The key is this:

Consider a polynomial (for general functions, this would be the Taylor polynomial near a root $f(b) = 0$)

$$ f(x) = a_1 (x-b) + a_2 (x-b)^2 + a_3 (x-b)^3 $$

which has a root at $x = b$.

Simple root case

Consider the simple root case $a_1 \neq 0$. We have that the steps

$$ (x-b) - \frac{f(x)}{f'(x)} = \frac{(x-b) f'(x) - f(x)}{f'(x)} = \frac{ a_2 (x-b)^2 + 2 a_3 (x-b)^3}{a_1 + 2 a_2 (x-b) + 3 a_3 (x-b)^2} $$

So this means that when $x_n$ is very close to the root $b$, that

$$ x_{n+1} - b = (x_n-b) - \frac{f(x_n)}{f'(x_n)} = \frac{a_2}{a_1} (x_n-b)^2 + o((x_n - b)^2) $$

That is, if the error $\epsilon_n = (x_n - b)$ at step $n$ is small, then the error $\epsilon_{n+1} \approx \frac{a_2}{a_1} \epsilon_n^2$ is much smaller.

Double root case

What if $a_1 = 0$ and $a_2 \neq 0$? The same computations give

$$ (x-b) - \frac{f(x)}{f'(x)} = \frac{a_2(x-b)^2 + 2 a_3(x-b)^2}{2 a_2(x-b) + 3a_3(x-b)^2} $$

which, when $x_n$ is very close to $b$, gives

$$ x{n+1} - b = (x_n - b) - \frac{f(x_n)}{f'(x_n)} = \frac12 (x_n-b) + o(x_n-b) $$

that is to say, the error terms decays like

$$ \epsilon_{n+1} \approx \frac12 \epsilon_n $$

Now the difference is clear: for the same starting (small) error $\epsilon$, in the simple root case the new error after running the algorithm for one step is only $\frac{a_2}{a_1} \epsilon^2$; in the double root case the new error is $\frac12 \epsilon$, which is much bigger whenever $\epsilon \ll \frac{|a_1|}{2|a_2|}$.

In words: in the simple root case Newton's method has the property that "the closer you are to the root, the higher the percentage of the error is corrected in each step" with the percentage going toward 100%. While in the double root case Newton's method has the property that "when close to the root, only about half of the error can be corrected at most in each step".

Double root case revisited

Now, if we use the formula given by Felix Mann in his comment, we can rework the approximation using the same polynomial as above. Assume $a_1 = 0$ and $a_2 \neq 0$.

$$ (x-b) - 2 \frac{f(x)}{f'(x)} = \frac{a_3(x-b)^3}{2a_2(x-b) + 3a_3(x-b)^2} $$

so starting with error $\epsilon_n = x_n - b$, your new error is $$ \epsilon_{n+1} \approx \frac{a_3}{2a_2} \epsilon_n^2 $$ and you get again the property that "the closer you get to the double root, the higher the percentage of error corrected per step". But Notice that if the root were actually simple (meaning $a_1 \neq 0$, then you would get

$$ (x-b) - 2 \frac{f(x)}{f'(x)} = \frac{- a_1(x-b) + a_3(x-b)^3}{a_1 + 2a_2(x-b) + 3a_3(x-b)^2} $$

and we arrive at a situation where "if the initial error is small, the error of the next step is approximately the same size!" And you won't even converge to the root. (So the modified formula really should only be used when you know for sure that the root is a double root.)

Intuition

The very first thing you need to remember is that Newton's method converges in one step for linear functions. That is, if you look at Newton's method, each step proceeds by approximating the curve by a triangle with height $f(x_n)$ and slope $f'(x_n)$ and correcting the guess $x_n$ by the corresponding base $f(x_n) / f'(x_n)$.

Simple root case

In the simple root case case when we zoom into the graph near its root the graph straightens out. So it becomes more and more like a triangle where we can find the base by dividing the height with the slope. Because the graph itself converges toward the triangle, Newton's method becomes more and more effective the closer you are to the root.

(The above animation is zooming into the origin on the graph of $y = (x+ 0.1)^3 - 0.001$ which has a simple root at $x = 0$.)

Multiple root case

In the multiple root case zooming in doesn't straighten out the curve at all. The picture always remain far from a triangle. In particular, you don't get the gain in effectiveness of the approximation by triangles as we saw in the simple root case, when we zoom in.

(The above animation is zooming into the origin on the graph of $y = x^3$ which has a triple root at $x = 0$.)

The Schroder approach mentioned in the comments below overcomes this by recognizing that the "shape" that the graph converges to is in fact $y = x^m$ where $m$ is the multiplicity. So instead of approximating by triangles (the linear case) one can approximate by these higher order curves. Again, since as we zoom in the graph becomes more and more like the model behavior of our approximation, we get the gain in effectiveness.

**Bumbble Comm** · Answer 2 · 2016-07-11 17:47:31

The short answer is that convergence is slow because the Newton iterates are taking small steps toward the multiple root. You can visualize this for $f(x)=x^2$, and throw in a bit of algebra to confirm the rate of convergence is at best linear, rather than the quadratic rate a Newton method exhibits for a simple root.

**Bumbble Comm** · Answer 3 · 2016-07-11 17:52:56

Newtons method is using a first degree approximation of the function to find the roots of a function.

Lets look at the Taylor series.

$f(x) = f(a) + f'(a) (x-a) + 1/2 f''(a) (x-a)^2 + O(x^3)$

If $f'(x)$ is large relative to the other derivatives of $f(x)$. Then a linear approximation is going to be pretty good $f(x)= f(a) + f'(a) (x-a) + \epsilon.$ but if $f'(a)$ is small, it isn't.

Newton's method is just the first degree Taylor approximation flipped on its head.

$f(x)= f(a) + f'(a) (x-a) + \epsilon = 0\\ f'(a) x = -f(a) + a f'(a)\\ x = a - f(a)/f'(a)$

Perhaps a geometric argument may be more intuitive.

If you start at $(x,f(x))$ as your initial guess at the root. Newton's method will take you to $x_1$, which is not particularly close to the actual root.

What does the derivative have to do with slow convergence in Newton's Method?

There are 3 best solutions below

Simple root case

Double root case

Double root case revisited

Intuition

Simple root case

Multiple root case

Related Questions in CALCULUS

Related Questions in CONVERGENCE-DIVERGENCE

Related Questions in NEWTON-RAPHSON

Trending Questions

Popular # Hahtags

Popular Questions