Solving equations like $xe^x = c$ via functional iteration

405 Views Asked by At

Yesterday I randomly thought of solving $xe^x = c$ via functional iteration (FI) after manipulating the equation into a form "$x = \cdots$" that gives the 'fastest' convergence rate regardless of the starting estimate.

I found by intuition-guided trial and error that the equation $x = ( c x^r e^{-x} )^\frac{1}{r+1}$ where $r = \ln(c)$ does exceptionally well for large $c$, as long as the starting estimate is positive.

In contrast, Newton-Raphson (NR) approximation (on $x e^x - c = 0$) does poorly if the starting estimate is too far. I know the details of this behaviour (see https://math.stackexchange.com/a/800070), but I don't know much about FI, so I don't really understand how to find a suitable equation to ensure 'fast' convergence.

After that, I found that the equation $x = (x+1)/(e^x/c+1)$ gives an even better FI, which out-performs both my first FI and NR.

First question is, why is this FI so good (Kibble has already done this part, so you don't have to mention it again if you answer my question), and how to find such a fast FI systematically?

After that, I found that NR-log (namely NR on $x + \ln(x) - \ln(c) = 0$) converges much faster than my improved FI, but only if the initial estimate is not too large, otherwise it generates a negative next estimate and crashes on the next iteration.

Second question is, is there a generic technique that for any equation in $x$ gives an FI to solve that equation for positive $x$ that converges 'fast' to the root for any positive starting point? My improved FI almost does that, but I have no idea how to do it for other equations in general, nor do I know if there is a 'better' one than mine.

Here I say 'fast' and 'better' but I don't really know how to precisely specify what these notions should be, so feel free to specify your own as long as it makes sense.

My intuition

I excluded these at first because my question is already quite long, but here is the intuition behind my FIs. First I was thinking that perhaps there is a generic method if we have a strictly increasing function $f$ such that $f(0) < 0$ and we want to find $y > 0$ such that $f(y) = 0$. My intuition was that to ensure quadratic convergence we want to find a function $g$ such that the original equation is equivalent to $x = g(x)$ and where:

  1. $g$ is bounded on $\mathbb{R}_{\ge 0}$.

  2. $|g'| < 1$.

  3. $g'(y) = 0$.

Both of my FIs satisfy the first two conditions but only the improved one satisfies the third. I don't know what better criteria are there.

Extra details

However, if we're looking for an efficient algorithm to compute $W(c)$ in particular then using NR-log with starting estimate $\ln(c+1)$ gives uniform quadratic convergence over all $c > 0$, which is incredible! In contrast NR takes $Ω(ln(ln(c)))$ steps to get to uniform quadratic convergence, while (my improved) FI takes $Ω(ln(c))$ steps. But NR and FI both have uniform quadratic convergence if we instead use starting estimate $\ln(c+1)-\ln(\ln(c+1)+1)$, so I can't really tell which one is better. (NR-log only has uniform linear convergence if we use this starting estimate because of small $c$.)

1

There are 1 best solutions below

8
On

I will here address the question "First question is, why is this one ($g(x)$ below) so good, and how to find such a fast functional iteration systematically?"

Lets take your two example functions:

$$f(x) \equiv (cx^re^{-x})^{\frac{1}{r+1}}$$ $$g(x) \equiv \frac{x+1}{\frac{e^x}{c}+1}$$

Expand in a Taylor series about the fixpoint $x = W(c)$. We then find

$$f(x) = W(c) + A(x-W(c)) + O(x-W(c))^2$$ $$g(x) = W(c) + B(x-W(c))^2 + O(x-W(c))^3$$

where $A,B$ are some constants depending on $c$ (which is less than 1 in absolute value). For a concrete example take $c=100$ then we find $A \approx 0.2$ and $B \approx 0.4$. Thus when we are close to the fixpoint the recursion $x_{n+1} = f(x_n)$ and $y_{n+1} = g(y_n)$ satisfy

$$x_{n+1} - W(c) \simeq A(x_n-W(c)) \implies |x_{n+1}-W(c)| \propto A^n$$

$$y_{n+1} - W(c) \simeq B(y_n-W(c))^2 \implies |y_{n+1}-W(c)| \propto B^{2^n}$$

We now see why $g$ works so much better, as $n\to\infty$ we have $B^{2^n}$ decreasing much faster than $A^n$. In technical terms we say that $f$ has a linear convergence rate while $g$ has a quadratic convergence rate.

To get a good convergence rate we want a function $h(x)$ which satisfy $h(x) \simeq W(c) + C(x-W(c))^n$ close to fixpoint with as low $C$ and as high $n$ as possible (plus it should also ensure convergence to the fixpoint even if we start far away from it). To get better than linear convergence we need $h'(x) = 0$ at the fixpoint. Finding such functions is often more an art than a science; use your intuition plus some trial and error to try to find it.