Machine Precison for Roundoff

89 Views Asked by At

Given the machine precision (machine epsilon) whose definition is that it is the smallest $\epsilon_{mach}$ such that 1 + $\epsilon_{mach}$ > 1. I found an explanation online to show that for roundoff, $\epsilon_{mach}$ is bounded above by $\frac{1}{2}\beta^{1-k}$ where $\beta$ is the base of the computer system and k is the length of the mantissa which is shown below. What I struggle to understand is the part where it says "with borrowing during the subtraction, we get" which is highlighted below in red. How does this step happen and what is the explanation behind it? And where do $(\beta - d_{k+1})$ and $(\beta - d_{k+2})$ come from?

enter image description here

1

There are 1 best solutions below

3
On BEST ANSWER

The symbols $d_i$ and $e_i$ are not standardized (as far as I know) and technically should be defined here. (And they were apparently defined in the original text, if we can believe the phrase "Using the above notation".) But I guess that the digits $d_i$ are the digits of the infinite-precision representation of $x$, whereas $e_i$ are the digits of the finite-precision representation in which $x$ is stored in the computer, and the exact value of that finite-precision representation is $fl(x).$

Under that interpretation, the phrase "borrowing during that subtraction" may be meant to remind you that although $e_k = d_k + 1,$ the $k$th digit after subtraction is not $e_k - d_k = 1$ but $0.$ It is the same reason why $1200 - 1100 = 100$ but $1200 - 1110 = 90$; in the first case we get a $1$ in the hundreds place because $2 - 1 = 1,$ but in the second case we do not get a $1$ in the hundreds place because we had to "borrow" it in order to deal with the fact that we had $10$ in the last two digits we were subtracting but only $00$ in the last two digits we were subtracting from.

But I find this entire explanation to be far more complicated than it needs to be. In my mind, the reason why $\lvert x - fl(x)\rvert \leq \frac12 \beta^{n-k}$ is simply that the next larger finite-precision number above $fl(x)$ is $fl(x) + \beta^{n-k}$ and the next smaller finite-precision number below $fl(x)$ is $fl(x) - \beta^{n-k}$ (or in the special case where $fl(x)$ is exactly a power of $\beta$, the next smaller number is $fl(x) - \frac12 \beta^{n-k}$).

So (assuming positive $x$) you cannot have $x > fl(x) + \frac12 \beta^{n-k}$ because any number greater than $fl(x) + \frac12 \beta^{n-k}$ would have been rounded up to $fl(x) + \beta^{n-k}$ or to something even greater; whereas any number less than $fl(x) - \frac12 \beta^{n-k}$ would have been rounded down to $fl(x) - \beta^{n-k}$ (or $fl(x) - \frac12 \beta^{n-k}$, if that is a finite-precision number) or to something even less. Therefore $fl(x) - \frac12 \beta^{n-k} \leq x \leq fl(x) + \frac12 \beta^{n-k},$ which is equivalent to $$\lvert x - fl(x)\rvert \leq \frac12 \beta^{n-k}.$$ Similarly for negative $x$ any number that is outside the interval from $fl(x) - \frac12 \beta^{n-k}$ to $fl(x) + \frac12 \beta^{n-k}$ would have been rounded up or down to a number different from $fl(x).$