Coupled iteration for real-valued cube root

504 Views Asked by At

Timm Ahrendt, Schnelle Berechnung der Exponentialfunktion auf hohe Genauigkeit, Berlin: Logos 1999, section 3.2.3 "Gekoppelte Newton-Iteration für Quadratwurzeln" has an elegant division-free coupled iteration for the computation of square roots, for which an unpublished manuscript by A. Schönhage is cited as the reference.

Given $r_{approx}$, a low-precision approximation to $a^{-\frac{1}{2}}$, compute $s_{0}=a*r_{approx}$ and $r_{0}=r_{approx}/2$ then iterate

$$s_{i+1} = s_{i} + r_{i} \cdot (a - s_{i} \cdot s_{i}) \\ \space \space \space \space \space \space \space \space r_{i+1} = r_{i} + r_{i} \cdot (1 - r_{i} \cdot 2 \cdot s_{i+1})$$

where the $s_{i}$ are approximations to $\sqrt{a}$ and the $r_{i}$ are approximations to $\frac{1}{2}a^{-\frac{1}{2}}$. The convergence is quadratic, as the method is related to the Newton iteration. For arbitrary-precision computation this coupled iteration has the advantage that to compute an $m$-bit square root, one only needs the reciprocal square root accurate to about $m/2$ bits. When using binary floating-point computation, this scheme allow for the efficient use of fused multiply-add, with the $i$-th step requiring four FMA operations plus a doubling, which is exact.

I have searched the literature for a similar division-free coupled scheme for real-valued cube roots, but have come up empty-handed. I looked for papers addressing cube roots in particular, and also for the more general case of $n$-th root.

Are any division-free coupled iterations for cube roots available in the literature? The convergence should be at least quadratic. Particularly useful would be schemes that lend themselves naturally to the use of fused multiply-add (FMA), for maximum performance on modern hardware platforms.

2

There are 2 best solutions below

2
On BEST ANSWER

For third order roots solving $f(x)=x^3-a$ you would have to approximate the inverse of the derivative $f'(x)^{-1}=\frac13x^{-2}$ with the sequence $r_i$.

So in analogy let $r_{approx}$ be a coarse approximation of $a^{-\frac23}$, set $s_0=a·r_{approx}$, $r_0=\frac13·r_{approx}$ and iterate $$ s_{i+1}=s_i+r_i·(a-s_i^3) $$ and compute $r_{i+1}$ as approximate solution to $$1=3r_{i+1}s_{i+1}^2=3(r_i+Δr_i)s_{i+1}^2$$ so that $3r_is_{i+1}^2=1-3Δr_is_{i+1}^2$ where the right side can be approximativly inverted to $$ 1=(1+3Δr_is_{i+1}^2)3r_is_{i+1}^2 $$ so that an approximate solution of quadratic error in similar form to the quadratic situation is \begin{align} r_{i+1}&=r_i(1+3Δr_is_{i+1}^2) \\ &=r_i+r_i·(1-3r_is_{i+1}^2) \end{align}


In general, $$ f'(s_{i+1})^{-1}=\frac{r_i}{1-[1-r_if'(s_{i+1})]} =r_i·(1+[1-r_if'(s_{i+1})]+O([1-r_if'(s_{i+1})]^2) $$ suggests the update of the approximate inverse derivative as $$r_{i+1}=2r_i-r_i^2f'(s_{i+1}).$$

4
On

You can use companion matrices of polynomials. Here is one for 3rd degree:

$${\bf C} =\left[\begin{array}{ccc}0&0&-c_0\\1&0&-c_1\\0&1&-c_2\end{array}\right]$$

It's a monic polynomial so $c_3$ is supposed be 1 and we are interested in solving the equation $$\sum_k c_kx^k = 0$$

If we set $c_0=a,c_1=0,c_2=0$ we see that we get to solve $x^3 - a = 0$ and $x=\sqrt[3]{a}$ is the real root.

Now eigenvalues of this matrix are the roots. The largest modulus eigenvalue is the one which will do best under multiplication so we will need to encourage the real eigenvalue a bit more than the complex conjugate ones, we can do this by adding $\lambda\bf I$ to the matrix. This will move all eigenvalues $+\lambda$ along the real line and ensure our real eigenvalue will be largest in modulus.

$${\bf C}+\lambda{\bf I} = \left[\begin{array}{ccc}\lambda&0&666\\1&\lambda&0\\0&1&\lambda\end{array}\right]$$

If we start with vector $[1,1,1]^T$ and calculate $({\bf C}+9{\bf I})^{16}[1,1,1]^T$ the mean value of the quotient is 8.73289174558397, which powered by three is 666.000004672982. One correct decimal digit every two iterations.

Row 1 is two multiplications and one addition. Other rows can be fused multiply+add or if we select $\lambda = 2^k$ even a bit-shift+add will do. Rather parallellizable and the only time division could come into play is last step when we need to compare two iterates to measure the eigenvalue.

Convergence plot when we use $+9{\bf I}$:

  • x is iteration,
  • y is $\log_2\left(\epsilon + \frac{|est-true|}{true}\right)$

enter image description here So convergence is a line on a log scale. Averaging about 2 new bits of precision per iteration. 32 bits of precision after 16 iterations starting with vector full of ones.