Error Propagation in Floating-Point Multiplication

1.8k Views Asked by At

Wikipedia (Machine epsilon) tells me that the result of a multiplication between 2 floating-point numbers, with a rounding induced relative error ϵ, still only has the relative error ϵ.

Why do the rounding errors not add up?

2

There are 2 best solutions below

0
On BEST ANSWER

The paragraph says that the model used assumes the input values are exact. They are then multiplied and the result rounded to fit in floating point. That rounding is what creates the relative error of $\epsilon$.

You are correct that if the input values already have an error that will flow through the calculation to the result. If you want to compute $e \pi$, you would first compute $e$ and $\pi$ and store values which might be off by a factor $1+\epsilon$. Then when you multiply them, the multiplication of the stored values adds another factor of $1+\epsilon$, so the computed product could be off by $3 \epsilon$. The point of the paragraph is that the multiplication operation itself only adds one $\epsilon$ to the uncertainty.

0
On

Keep in mind that Wikipedia article comes from the school of willful pretending (that floating point arithmetic in "modern" processors is "exact" up to a "rounding error"). The very statement "floating-point operations are done as if it were possible to perform the infinite-precision operation" is essentially false since "as if" would take infinite time. FP hardware does a number of tricks (like hidden digits) to maintain the perception that bits are not lost but that works only when you start with numbers that don't push the limits and when the number of operations is small.

If you would do a very good approximation of infinite precision arithmetic and compare what say a Binary64 commercial hardware is giving you you would see how bits are getting lost.

Minimal relative error for multiplication is 1 bit equivalent (ulp - unit in last place 1*b^[e-p+1]), e - exponents width, p - precision (bits in mantissa). And yes relative error should be in % but since we are talking minimal error that means that the size of the result is treated as fixed/constant and we can calculate with bits as if we were taking the log.

With the best luck, after e+p multiplications you have a very expensive random number generator :-) That would be 1275 multiplications in binary64 (double precision). Willful pretending doctrine can ignore inherent loss since no one has a real time or will to check these things for more complex operations.

This is why IBM's hardware had binary128 since 1998 and was able to charge millions for it (stil is). You can't do geometric mean worth a penny without it. Its limit is 16495 multiplications and limiting the number of data points to 14k-15k gives you enough breathing space for additional operations.

Related to geometric mean, square root can theoretically "recover precision" but only if intermediary result(s) carried all extra bits. So, sparing the last 1500 data points gives you enough margin for sqrt to recover full binary64 precision (from binary128 intermediary) if the algorithm was genuinely lossless. I don't know if you can realistically expect that even from IBM so going for 14k points provides extra safety.

Note: you can calculate (theoretical) relative error by taking a differential (assuming all arguments are functions of a hidden variable: d(xy)= xdy+ydx), switching to finite difference and dividing by the result [maximal result for minimal error estimate]. So relative errors for multiplication add up and for square root it's half the argument's error - but only if it carried enough extra bits.