Accurate floating-point linear interpolation

Question

Accurate floating-point linear interpolation

3.3k Views Asked by Bumbble Comm At 11 May 2026 - 2:55

I want to perform a simple linear interpolation between $A$ and $B$ (which are binary floating-point values) using floating-point math with IEEE-754 round-to-nearest-or-even rounding rules, as accurately as possible. Please note that speed is not a big concern here.

I know of two basic approaches. I'll use the symbols $\oplus, \ominus, \otimes, \oslash$ following Knuth [1], to mean floating-point addition, subtraction, product and division, respectively (actually I don't use division, but I've listed it for completeness).

(1) $\quad f(t) = A\,\oplus\,(B\ominus A)\otimes t$

(2) $\quad f(t) = A\otimes(1\ominus t)\,\oplus \,B\otimes t$

Each method has its pros and cons. Method (1) is clearly monotonic, which is a very interesting property, while it is not obvious at all to me that that holds for method (2), and I suspect it may not be the case. On the other hand, method (2) has the advantage that when $t = 1$ the result is exactly $B$, not an approximation, and that is also a desirable property (and exactly $A$ when $t = 0$, but method (1) does that too). That follows from the properties listed in [2], in particular:

$u\oplus v = v\oplus u$

$u\ominus v = u\oplus -v$

$u\oplus v = 0$ if and only if $v = -u$

$u\oplus 0 = u$

$u\otimes 1 = u$

$u\otimes v = 0$ if and only if $u = 0$ or $v = 0$

In [3] Knuth also discusses this case:

$u' = (u\oplus v)\ominus v$

which implicitly means that $u'$ may or may not be equal to $u$. Replacing $u$ with $B$ and $v$ with $-A$ and using the above rules, it follows that it's not granted that $A\oplus(B\ominus A) = B$, meaning that method (1) does not always produce $B$ when $t = 1$.

So, here come my questions:

Is method (2) guaranteed to be monotonic?
If not, is there a better method that is accurate, monotonic and yields $A$ when $t = 0$ and $B$ when $t = 1$?
If not (or you don't know), does method (1) when $t = 1$ always overshoot (that is, $A\oplus(B\ominus A)=A+(B-A)\cdot t$ for some $t \geq 1$)? Always undershoot (ditto for some $t \leq 1$)? Or sometimes overshoot and sometimes undershoot?

I assume that if method (1) always undershoots, I can make a special case when $t = 1$ to obtain the desired property of being exactly equal to $B$ when $t = 1$, but if it always overshoots, then I can't. That's the reason for question 3.

EDIT: I've found that the answer to question 3 is that it sometimes overshoots and sometimes undershoots. For example, in double precision:

-0x1.cae164da859c9p-1 + (0x1.eb4bf7b6b2d6ep-1 - (-0x1.cae164da859c9p-1))

results in 0x1.eb4bf7b6b2d6fp-1, which is 1 ulp greater than the original, while

-0x1.be03888ad585cp-1 + (0x1.0d9940702d541p-1 - (-0x1.be03888ad585cp-1))

results in 0x1.0d9940702d540p-1, which is 1 ulp less than the original. ~~On the other hand, the method that I planned (special casing $t=1$) won't fly, because I've found it can be the case where $t < 1$ and $A\oplus(B\ominus A)\otimes t > B$, for example:~~

t = 0x1.fffffffffffffp-1 A = 0x1.afb669777cbfdp+2 B = 0x1.bd7b786d2fd28p+1

$A \oplus (B \ominus A)\otimes t =\,$ 0x1.bd7b786d2fd29p+1

~~which means that if method (1) is to be used, the only strategy that may work is clamping.~~

Update: As noted by Davis Herring in a comment and later checked by me, special casing t=1 actually works.

References

[1] D.E.Knuth, The Art of Computer Programming, vol. 2: Seminumerical algorithms, third edition, p. 215

[2] Op. cit. pp. 230-231

[3] Op. cit. p.235 eq.(41)

Original Q&A

There are 4 best solutions below

**Bumbble Comm** · Answer 1 · 2014-08-27 17:41:36

Method 2 is monotonic, if you are using round-to-even (which is fortunately the default).

Let's consider $A=B=1$ and half-precision numbers ('cause they're short), and $t=5/2^{12}$: t = 0.000000000101 1-t = 0.111111111011 But that's too long - we only get 11 bits. What value we actually get depends on what rounding mode we're in.

in "towards 0" (truncate) and "towards $-\infty$" (floor) modes:

1-t = 0.11111111101

in the other modes, "towards $\infty$" (ceiling), "ties away from 0", and "ties to even":

1-t = 0.11111111110

Now let's add them back together.

truncate and floor:$t+(1-t)=0.11111111111(1)=0.11111111111<1$

ceiling and ties away from 0: $t+(1-t)=1.0000000000(1) = 1.0000000001>1$

ties to even: $t+(1-t)=1.0000000000(1)=1.0000000000=1$

Some more analysis tells us what's going on: the goal is to have the two rounding steps counteract each other. This never happens with floor/truncate/ceiling. Most of the time it happens with ties away from zero, but in the situation where there is a tie, both rounding steps bias the result upward. With rounds-to-even, the rounding steps are always opposite each other: for ones that round down during the $1-t$ step ($3/2^{12}$ for instance), they'll round up during the addition step, and vice versa.

**Bumbble Comm** · Answer 2 · 2016-05-24 17:07:56

1) Method 2 is not always monotonic. Counterexample in double precision:

$$A = B = 4000\quad t=\tt{0x1.b2f3db7800a39p-2}$$

From there, it results that:

$$1\ominus t=\tt{0x1.26861243ffae4p-1}$$

$$4000\otimes \tt{0x1.26861243ffae4p-1}\oplus 4000\otimes \tt{0x1.b2f3db7800a39p-2}=\tt{0x1.f400000000001p+11}\approx 4000.0000000000005$$

(obviously, when $t = 0$ and $t = 1$ the result equals 4000, so it ascends then descends, therefore it's not monotonic).

2) The second question asks for a better method. Here is a candidate:

$$lerp_{A,B}(t)=\begin{cases}A\oplus(B\ominus A)\otimes t&& \text{if}\,\ t<0.5 \\ B\ominus(B\ominus A)\otimes(1\ominus t)&&\text{otherwise}\end{cases}$$

This method has the following properties:

It matches the endpoints (obvious).
Both halves are monotonic (obvious).
Exact when $A=B$ (obvious)
Seems to be monotonic in the change of intervals.

The latter is the only part that is not proven. It turns out that there are many values of $A$ and $B$ with $A<B$ for which $A\oplus(B\ominus A)\otimes 0.5>B\ominus(B\ominus A)\otimes 0.5$; however, after testing many billions of single- and double-precision numbers, I could not find a single case violating either of these implications:

Let $s=0.5-\mathrm{ulp}(0.5)/2\quad$(the number immediately preceding 0.5), then

$A<B\implies A\oplus(B\ominus A)\otimes s\le B\ominus(B\ominus A)\otimes 0.5$

$A>B\implies A\oplus(B\ominus A)\otimes s\ge B\ominus(B\ominus A)\otimes 0.5$

Update: The reason it works seems to have to do with this fact: barring underflow, for any positive binary floating point number $u, u\otimes s < u/2$, and similarly for negative $u$ changing the inequality direction.

Or put another way, under the same assumption, for base 2 and precision $p, u\otimes\frac{(2^p-1)}{2^p} <u$. It can be shown that the bit that determines rounding (the bit next to the last bit of the mantissa) is always zero when performing that product, therefore the product is always rounded towards zero. That multiplication has the effect of subtracting 1 ulp from the number (or 1/2 ulp if the input number has a mantissa of 1.0).

I've deleted my two other proposals after discovering that both violated monotonicity.

**Bumbble Comm** · Answer 3 · 2022-04-14 16:27:51

I'll show how to compute the value of the linear interpolation with correct rounding; i.e. as if the linear interpolation were computed with exact arithmetic and then rounded.

First represent the linear interpolation as $a + tb - ta$. Using the exact product from Pedro Gimenos answer, decompose $tb = c_0 + c_1$. Similarly for $ta = d_0 + d_1$. These transformations are error-free.

The problem is then reduced to computing the sum $a + c_0 + c_1 + d_0 + d_1$ with correct rounding. This can be done by applying the iFastSum algorithm from the paper Correct Rounding and a Hybrid Approach to Exact Floating-Point Summation.

As a bonus, correct rounding implies that the sign of the result is exact.

Clearly the same procedure can be adapted to compute dot products with correct rounding. This implies that there is an efficient exact test for the sign of a dot product, which is quite interesting for computational geometry.

I did a quick timing. The algorithm described here, implemented in C#, is 14 times slower than the naive floating point $(1 - t)a + tb$, averaged over 10 million computations where $a$, $b$ and $t$ are each uniformly random on $[0, 1]$.

Aside: for a correctly rounded sum algorithm which is asymptotically fast and uses iFastSum as a sub-algorithm: "Algorithm 908: Online Exact Summation of Floating-Point Streams", ACM Transactions on Mathematical Software,Volume 37, Issue 3, September 2010.

**Bumbble Comm** · Answer 4 · 2022-04-27 00:29:28

I provided a correctly rounded approach in my other answer. I'll provide here another approach for improved accuracy which is within 2x performance of naive interpolation, and probably gets you a faithfully rounded result (i.e error < 1 ulp).

Simply implement the compensated pair double arithmetic from the paper "Faithfully Rounded Floating-point Computations". Then compute as you would normally. The implementation effort is low.

Accurate floating-point linear interpolation

There are 4 best solutions below

Related Questions in INTERPOLATION

Related Questions in FLOATING-POINT

Trending Questions

Popular # Hahtags

Popular Questions