Does the rounding unit of a floating point system depend only on the mantissa?

754 Views Asked by At

The rounding unit (or machine epsilon) of a binary floating point system is usually represented as $\frac{2^{-(p - 1)}}{2}$ or simply $2^{-(p - 1)}$, according to this Wikepedia's article (if I'm not wrong): not sure yet why it has two values...

So it seems that it only depends on the number of bits in the mantissa, but I can't really visualize (and therefore fully understand) why is that.

Could you please try to explain me why the exponent has nothing to do with the rounding unit?

(Maybe I should go to sleep...)

2

There are 2 best solutions below

1
On

Because the machine epsilon is the smallest quantity you can add to $1$ and get a result that differs from $1$. So in a sense the definition does not say the exponent has nothing to do with it, instead it says the exponent is the special number $0$.

BTW other definitions of the machine epsilon use the smallest quantity you can subtract from $1$ and get a result that differs from $1$. The definitions differ by a factor of $2$.

0
On

Because the machine epsilon is defined in terms of "relative error", the relative error is such that the exponent value doesn't affect the error.

Let $x$ be a real number while let $y = RN(x)$ (assume $t$ mantissa digits), assuming rounding instead of truncation you can easily prove that

$$ \epsilon_{abs} = |x - y| \leq 2^{e_x - t - 1}$ $$

($e_x$ is the exponent of the number $x$). If we compute the relative error instead we have

$$ \epsilon_{rel} = \frac{|x - y|}{|x|} \leq \frac{2^{e_x - t - 1}}{|x|} \leq \frac{2^{e_x - t - 1}}{2^{e_x}} = 2^{-t-1} $$

The quantity $2^{-t-1}$ is what we define machine epsilon. If instead of "rounding" you assume "truncation" you would get $2^{-t}$ as bound.