Floating point numbers, real numbers and machine precision

172 Views Asked by At

My math-book states that when a real number $x$ is replaced by a floating-point number $F(x)$ then the error between the two is: $|error|=|x-F(x)|\le\epsilon |x|$

Now the book asks:

Consider two different, but mathematically equivalent expressions, having the value $C$ after evaluation. If we suspect that the computer satisfactorily evaluates the expressions for many input values within an interval, all to within machine-precision, why might we expect the difference of these expressions on a computer to have an error contained within an interval $[-\epsilon C,\epsilon C]$?

In the answer it is claimed that the computer will generate two numbers for both expressions: $C(1+\alpha)$ and $C(1+\beta)$ where $\alpha$ and $\beta$ are both positive and less than $\epsilon$.

Therefore indeed: $C(1+\alpha)-C(1+\beta)=C(\alpha-\beta)$ where $|\alpha-\beta|<\epsilon$

QUESTION: I don't see why $\alpha$ and $\beta$ have to be positive. This suggests that the computer would always generate a number that is higher than the real number. Is this true? Indeed the statement $|\alpha-\beta|<\epsilon$ is only true when both are positive and less than $\epsilon$. So therefore again: is it true that $\alpha$ and $\beta$ have to be positive? If not, does the statement that the error has to be contained within an interval $[-\epsilon C,\epsilon C]$ still hold true, possibly because of another reason?

1

There are 1 best solutions below

0
On

It is certainly true that the computed value could be less than the true value, so $\alpha$ and/or $\beta$ could be negative. It is not clear what "satisfactorily evaluates" means. Does it just mean executing the code properly? Does it mean that the result is within $[C(1-\epsilon),C(1+\epsilon)]?$ You can have subtractive cancellation that makes the relative error much larger than $\epsilon$. Suppose I want to compute $10^{-50}$. I can just type that into the computer and probably get an answer with fractional error (about) $\epsilon$. I could also ask for $1- (1-10^{-50})$. These expression are mathematically equivalent, but I suspect the numeric error on the second will be much higher. In standard $64-$bit arithmetic there is not enough precision for this.