Derive worst case error inputs for floating addition/multiplication

143 Views Asked by At

It is well know the following floating point model

$$ fl(x \;op\; y) = \left( x \;op\; y \right)(1+\delta), |\delta| \leq u = 2^{-t-1} $$

(section 2.2. formula 2.4. of this book).

Just because I want to understand if there's already some study about it or not. Is there a well known constructive proof that could eventually shows for the specific case of addition and multiplication a proposition like

$$ \forall x \in F \exists y\in F : \left| \frac{fl(x \;op\; y) - x \;op\; y}{x \;op\; y} \right|=u $$

Is there any reference about it? If not is there a simple counter example?

2

There are 2 best solutions below

4
On

If that equation is the only thing you have, then there is nothing that you can do. On the other hand, if you use IEEE 754 floating point operations with standard rounding, then any result x in the range $2^k ≤ x ≤ 2^{k+1}$ will be rounded to the nearest multiple of $2^k · u$, and you can find the maximum relative error from that.

0
On

I interpret the question as follows. In IEEE-754 floating-point arithmetic with a rounding mode of "to nearest or even", the result of a basic arithmetic operation will differ from the infinitely precise result by no more than half an ulp, that is the error is $\le \frac{1}{2} ulp$. For addition and multiplication, are there pairs of floating-point operands $a, b$ for which the error in $a\space{op}\space b$ is exactly $\frac{1}{2} ulp$?

The error is exactly $\frac{1}{2} ulp$ if the result falls exactly on the half-way point between two representable floating-point numbers. This is known as a tie case. It is easily shown that such cases exists for both addition and multiplication my manual construction:

$(1-ulp)+0.5$ is a tie case resulting from addition, while $(1-2ulp)*1.5$ is a tie case resulting from multiplication. In programming languages like C and C++ operands like $(1-ulp)$ can easily be constructed with the nextafter(), nextafterf() functions, if one desires a practical test.

The question remains, can one give a general algorithm that generates pairs of operands resulting in tie cases? The question is relevant to generation of test vectors for new floating-point hardware, for example. Traditionally, such verification relied on canned test vectors, such as Jerome Coonen's collection of (presumably) manually constructed cases. More recently, there have been publications advocating an algorithmic approach, e.g.:

Merav Aharoni, et al. "Solving constraints on the invisible bits of the intermediate result for floating-point verification." In Proceedings 17th IEEE Symposium on Computer Arithmetic (ARITH'05). IEEE 2005, pp. 76-83 (online)

The invisible bits mentioned in the paper's title refer to the round and sticky bits prior to rounding, where for tie cases we would impose the constraints that the round bit be $1$ and the sticky bit be $0$.