Compute the new reciprocal from the old one for Barrett division

79 Views Asked by At

In a rather old post on the TCL mailing list Lars Hellström published code for a small function to convert binary numbers into their decimal representation. A simple divide & conquer algorithm using precomputed reciprocals to replace the large divisions with faster multiplications. The code itself is nicely structured and easy to follow, the construction of the reciprocals on the other side is not, although that maybe only me.

A bit of a background for those not fluent in programming with big-integers: this is, obviously, pure integer math over $\mathbb{Z}$, division is implemented as rounding to the next integer in direction of negative infinity and division/multiplication by powers of two is cheap.

The algorithm uses Barrett division: compute $\left\lceil2^k/n \right\rceil$ with $k$ chosen such that $k = 2\left\lceil\log_2 n \right\rceil$ to take advantage of the fact that the result is in that case close enough to the actual result that a single small and fast correction suffices.

But no matter what, the division $\left\lfloor 2^k/n \right\rfloor$ is still expensive and should be avoided if possible, so Lars Hellström's trick is to compute $\left\lceil\left(8\cdot2^{2k}\right)/n^2\right\rceil$ from $\left\lceil\left(8\cdot2^{k}\right)/n\right\rceil$. The start value, calculated in advance, is scaled and given as the constant $M = \left\lceil 8 \cdot 2^{20}/1000\right\rceil = 8389$ to the program.

A simple $m_{p + 1} = m_p^2$ with $m_0 = 2^k/n$ is not possible here because we are restricted to $\mathbb{Z}$ and the other conditions listed above, so he used another trick:

$$ M = \left\lfloor \left(\left\lfloor\frac{n^2M^4}{ 2^{2k+6}}\right\rfloor - \left(2M^2\right) \right) / 8\right\rfloor$$

and does a final $\left\lceil -M/8 \right\rceil$ to get the actual new reciprocal $m_{p+1} = \left\lceil 2^{2k}/n^2 \right\rceil$.

He noted above the line that computes the new $M$ that it is a round of Newton-Raphson and this is the point where I am stuck.

Can you explain the math behind it (not what Newton-Raphson root finding is but the actual implementation here) or at least point me in the right direction?

EDIT:

After peeling all of the layers off I got it down to the simple sum $\left\lfloor2^{2^mk + 1}/n^{\left(2^m\right)}\right\rfloor - \left\lfloor2^{2^mk}/n^{\left(2^m\right)}\right\rfloor$ with $m = \{0,1,2,3,\dots\}$ which is equal to the sequence we aim for: $\left\lceil2^{2^mk}/n^{\left(2^m\right)}\right\rceil$. It is just a matter of careful scaling (and precomputing the value at $m = 0$) to get rid of the large division by $n^{\left(2^m\right)}$ and replacing it with a "cheap" shift (a division by $2^n$).

But now I am even more irritated: what has that to do with "one round of Newton-Raphson" as he described it in a comment in the code?

$$ $$

1

There are 1 best solutions below

0
On

It took me a bit because I had the wrong ansatz first but the answer is embarrassingly simple.

The reciprocals are the values of the sequence $\left\lceil\left(2^s/n\right)^{\left(2^k\right)}\right\rceil = \left\lceil\left(2^{2^ks}\right)/n^{\left(2^k\right)}\right\rceil$ (with $k = \{0,1,2,3,\dots\}$, $s = 20$, and $n = 1\,000$). We cannot simply square because $2^s \nmid n$ and hence $\left\lceil\left(2^s/n\right)\right\rceil^{\left(2^k\right)} \not=\left\lceil\left(2^s/n\right)^{\left(2^k\right)}\right\rceil$ but we can use one round of Newton-Raphson to overcome this inconvenient obstacle.

$$x_{n+1} = x_n\left(2 - Dx_n\right) = 2x_n - Dx_n^2$$

By substituting $x_n = \left\lceil 2^s/n \right\rceil^2$ and $D = \left(n/2^s\right)^2$ we get

$$ x_{n+1} = 2\left\lceil 2^s/n\right\rceil^2 - \left\lfloor\frac{\left\lceil 2^s/n\right\rceil^4 n^2}{2^{2s}}\right\rfloor $$

The error $\left(1- Dx_n\right)^2$ is $1$, assuming the same substitutions as above, but that is misleading. The error without the truncating division is about $2\mathrm{e}{-34}$ and a simple addition of $1$ will not do it for all of the expected input (maximum $k$ is 22) so a small extra has been added.

A numerical experiment showed no problems with just 2 bits extra up to $k=24$, the one bit on top off that is most likely angst–allowance.