Minimum error in floating point approximation of an elementary function.

Question

Minimum error in floating point approximation of an elementary function.

187 Views Asked by Bumbble Comm At 09 Apr 2026 - 8:31

I need a confirmation of a thing that probably is silly.

Let $x$ a floating point number representable using $e$ bits for exponent and $m$ bits for mantissa, let $f$ a be an elementary function, you can suppose if it helps $D(f) = A = [1,2)$ and $f(A) = [1,2)$, you can transform any elementary function such that is defined in the intervals I just specified. Assume I've implemented a an algorithm $\psi$ such that $\psi$ approximates $f$ in the same floating point system.

I was wondering except for trivial cases the MINIMUM accuracy I could achieve by such algorithm, whatever it is. The answer should trivially be $0.5 ulp$ right?

My answer is motivated by the following:

The set of floating point numbers I've defined is finite, so trivially I can implement the computation of $f$ as $\psi(x) = \circ(f(x))$, where with $\circ(\cdot)$ I denote the rounding operation. So I can trivially sample the original function through all the floating points numbers, round the result, and store the result. I can have two situations:

$f(x)$ is a floating point number, in such a case the error is $0$, this is what a would call trivial.
$f(x)$ is not a floating point number, in such a case the rounding would provide me 0.5 ulp of accuracy. This is not trivial.

So because of this the min accuracy I can achieve is 0.5 ulp, right? It's a theoretical lower bound in non trivial situation what I'm looking for.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2016-06-24 14:24:39

I guess the following is what you are asking:

The set of floating point numbers (let us write $\mathbf{FP}$ for it) with a given exponent and mantissa is finite.
Thus the set of functions $\mathcal{F} = \{ f: \mathbf{FP} \to \mathbf{FP}\}$ with domain and range the floating point numbers is also finite.
$\mathbf{FP}$ can be included in the reals $\mathbb{R}$ as a subset.
Let $g: \mathbb{R} \to \mathbb{R}$ be a function.
Define a function $d_g: \mathcal{F} \to \mathbb{R}$, which we call the "accuracy" (the actual definition is unimportant here!) of the floating point function $f$ compared to the real valued function $g$. There are various ways to define it, but you can write for example $$ d_g(f) = \max_{x\in \mathbf{FP}} \frac{|f(x) - g(x)|}{\max(|f(x)|, |g(x)|)} $$ with the convention that if $f(x) = g(x) = 0$ then the ratio evaluates to 0. Again I stress the actual definition $d_g$ is unimportant!
Since $\mathcal{F}$ is a finite set, there exists (not necessarily uniquely), some function $f_0\in \mathcal{F}$ such that $d_g(f_0) \leq d_g(f)$ for any other $f\in \mathcal{F}$. In other words, regardless of how you define accuracy, as long as accuracy is a well-defined concept that can be measured by a real number, then there is a (but not necessarily uniquely) "most accurate" floating point representation of $g$.
If you know that $g$ is, you can pre-compute $f_0$ and define, as you indicated, $g$'s floating point representation by $f_0$ via, say, a look-up-table.

Incidentally, another way to define $d_g$ is $$ d_g(f) = \sum_{x \in \mathbf{FP}} |f(x) - g(x)| $$ You then see that if $f_0$ minimizes $d_g$, this means for every other floating point function $f$ and every floating point number $x$ is must be that $$ |f(x) - g(x) | \geq |f_0(x) - g(x)| $$ which is probably what you want in terms of "most accurate representation".

In terms of whether you can do computationally better than implementing a look-up table: it depends really on the function. For example, if you look at the function $$ g(x) = \frac32 x $$ it has no exact floating point representation. In fact, for half of the floating point numbers $g(x)$ is as far from having an exact floating point representation as possible. But there is an easy algorithm that does as well as the LUT.

Minimum error in floating point approximation of an elementary function.

There are 1 best solutions below

Related Questions in NUMERICAL-METHODS

Related Questions in FLOATING-POINT

Related Questions in ERROR-PROPAGATION

Related Questions in ROUNDING-ERROR

Related Questions in ROUNDING-UNIT

Trending Questions

Popular # Hahtags

Popular Questions