Why does newton's method of successive approximation to compute square roots work?

1.8k Views Asked by At

The algorithm is given as follows:

Begin by guessing that the square root is x / 2. Call that guess g.

The actual square root must lie between g and x/g. At each step in the successive approximation, generate a new guess by averaging g and x/g.

Repeat step 2 until the values of g and x/g are as close together as the precision of the hardware allows. In Java, the best way to check for this condition is to test whether the average is equal to either of the values used to generate it.

My question is in relation to steps 2 and 3. Why do we make the new guess (g + x/g) / 2 and why do we compare this result to g and x/g to determine whether or not to terminate our loop ?

3

There are 3 best solutions below

0
On BEST ANSWER

Newton's method approximates a zero of the function $f$ by iterating $$ x \mapsto x - \frac{f(x)}{f'(x)} $$ which has the geometric interpretation of finding the intersection between the $x$-axis and the tangent through the point $(x,f(x))$.

In the particular case of the square root, we use $f(x)=x^2-a$ whose zeroes are exactly at $x=\pm\sqrt a$. Then it so happens that the the iteration formula simplifies: $$ x - \frac{f(x)}{f'(x)} = x - \frac{x^2-a}{2x} = \frac{x + a/x}2 $$ so finding "the point where the tangent intersects the $x$-axis" and "the average of $x$ and $a/x$" turns out to produce the same result.

But it's only the former description that's Newton's method. The latter is specific to square roots, and should only be called "Newton's method" if we have actually derived it in the above way.

If, on the other hand, we arrive at $x\mapsto\frac{x+a/x}2$ by some other kind of heuristic or geometric reasoning, then what we get is just some approximation, which strictly speaking has nothing to do with Newton -- though observing that it is numerically equal to Newton's method can help explain why it works as well as it does.

0
On

Newton's method for square roots is a special case of Newton's method in general, as @HenningMakholm discussed. Heuristically, Newton's method should work because a nice function is well approximated by its tangent lines, provided you look close to the point of tangency. This means that the roots of a nonlinear function are well-approximated by the roots of its tangent lines when the points of tangency are chosen close enough to the root. However, when you go to write what I just said up into a proof, you find that it only works for sufficiently good initial guesses, whereas the convergence for Newton's method for square roots does not care what your initial guess is.

We can heuristically argue that the convergence should be global using what you've already said: $\sqrt{x}$ is between $g$ and $x/g$ for any positive $g$, and Newton's method for square roots ultimately just averages these two numbers. So intuitively the error should at least get cut by a factor of $1/2$ at each step. Directly refining this heuristic into a proof is a bit tricky, though, because you move both endpoints of the "bracketing" interval at every step.

A different way to proceed is to consider the derivative of the iteration function $f(g)=\frac{g+x/g}{2}$. This is $\frac{1-a/g^2}{2}$. If this is strictly less than $1$ in magnitude on some domain, and $f$ maps that domain into itself, then the convergence is justified by a rather well-known result, sometimes called the contraction mapping principle, other times called the Banach fixed point theorem.

Now how can we work with these two assumptions? First we see that the derivative is not always less than $1$ in magnitude: if $x/g^2>3$ then $\frac{1-x/g^2}{2}<-1$. So something bad might happen when $x/g_0^2>3$. However, the minimum value of $f$ is $\sqrt{x}$ at $g=\sqrt{x}$. Thus $f$ maps all $g_0$ into $[\sqrt{x},\infty)$ and then maps that interval into itself. So $x/g_1^2<1<3$, and then this is true for the rest of the procedure. Thus the hypotheses of the contraction mapping principle apply and you get convergence.

0
On

About the convergence:

Observe that with $g'$ being the new guess,

$$\frac{g'-\sqrt x}{g'+\sqrt x}=\dfrac{\dfrac12\left(g+\dfrac xg\right)-\sqrt x}{\dfrac12\left(g+\dfrac xg\right)+\sqrt x}=\frac{g^2-2g\sqrt x+\sqrt x^2}{g^2+2g\sqrt x+\sqrt x^2}=\left(\frac{g-\sqrt x}{g+\sqrt x}\right)^2.$$

The given ratio is absolutely smaller than $1$, and this equation shows that the new guess is closer to the solution than the previous, whatever $g$. By iterating,

$$\frac{g^{(k)}-\sqrt x}{g^{(k)}+\sqrt x}=\left(\frac{g-\sqrt x}{g+\sqrt x}\right)^{2^k}$$ and the ratio tends to $0$.

If the initial guess is poor, the ratio is close to $1$, say $1-\epsilon$, and

$$(1-\epsilon)^2\lesssim1-2\epsilon.$$ Accuracy increases by about one bit per iteration. But after a while, convergence becomes "explosive".

For example, $x=2,g=100$ gives the ratios

$$0.972\cdots,0.945\cdots,0.893\cdots,0.797\cdots,0.636\cdots,0.404\cdots,0.163\cdots,0.027\cdots,0.000\cdots$$