Why does Ridders' method work as well as it does?

3.9k Views Asked by At

I've just read section 9.2.1 in Numerical Recipes Ed. 3 (Press et al. 2007), which describes Ridders' method of root finding. I understand that allowing for some curvature of the function by linearizing three points with an exponential will probably get you a better approximation and more rapid convergence, but I have no idea how they get that it converges quadratically, and the motivation behind choosing an exponential (rather than e.g. a polynomial) is not clear to me.

So, specifically, I'm wondering (1) why linearize with an exponential, and (2) why the method converges quadratically.

I tried looking for Ridders' original paper "A new algorithm for computing a single root of a real continuous function", but the only copy I could find was in a journal that my school's library does not give me free access to.

UPDATE:

I've realized that the fact that an exponential is strictly positive means that weighting the three points with an exponential won't change their signs. Hence, the root of the linearization will have to land inside the original bracket.

Polynomials, rationals, sinusoids, and logarithms all have positive and negative bits, so using them probably runs the risk of getting bad interpolated roots. One of these squared might be usable, but maybe non-monoticity gives some sort of undesirable effect. Since this is a complete list of all simple functions that I am aware of, exponentials seem to be the only easily available weighting function with this property.

A constant function is also technically single-signed and monotone, but it can't be used to linearize stuff.

2

There are 2 best solutions below

0
On

Not an answer, just a plot that surprised me:
I thought that Ridders' method was the same as fitting $a + b \, e^{c x}$ to the 3 points. Not so:

enter image description here

0
On

More in-depth analysis

A very good general answer is already but given, but I'll give some more detailed analysis on the behavior of Ridder's method.

The first notable point is that the error approximation is signed and exact (by replacing $r$ in the derivatives with a point in the interval). This is important, because the sign of the new error is given by the sign of $-Ce_1$, where $e_1$ is the error of the bisection step and $C$ is all of the approximately constant terms lumped together. If $C$ is negative, then the sign of the new error is the same as $e_1$, which will drag in the furthest endpoint on every iteration, which will cause fast convergence. If $C$ is positive, then the sign of the error is opposite of $e_1$, so one endpoint converges using only bisection while the other bound converges roughly linearly (the digits accurate will be quadratic in the amount of iterations).

In the event that $C<0$, then it can furthermore be found that quadratic convergence does indeed happen. It is worth noting, however, that the Ostrowski index is only $\sqrt2\approx1.44$, since every iteration is actually composed of two steps.

One can see Tim Peter's examples for cases where Ridder's method converges slowly all satisfy $C>0$. In particular, it is easy to see that Ridder's method converges slowly for any quadratic function, for example.

Other 3-point methods will generally outperform Ridder's method, giving higher Ostrowski indexes, even in the worst cases, and converging faster in pathological cases (such as high order roots). Even if one considers the order of convergence and neglects the Ostrowski index, other methods converge much faster in their worst cases compared to the staggeringly slow case with Ridder's method. Generally Dekker's, Brent's, and Chandrupatla's methods are much better, to name a few.

Intuitively, these methods perform much better because the third point used is much closer to the root than the result of bisection. Using bisection, however, leads to a much simpler formula, due to cancellations. If one were to use the same approach as these hybrid methods, however, much faster worst case convergence could be attained.