How to interpret fractional number of bits of precision

952 Views Asked by At

In double-precision floating-point format there're effective $53$ bits of mantissa stored. This lets us estimate maximum number of decimal digits of precision available: $$N_{max}=\log_{10}2^{53}\approx15.955.$$

Of course, I do understand that this estimation implies that at least $15$ decimal digits are guaranteed to be stored in this format. But the actual number is quite close to $16$, and it seems to me that in most cases we could somehow "extract" this extra digit hoping that it's somewhat correct.

But strictly speaking, what does it really mean that we have additional $0.955$ digits of precision? Does it mean that there're individual numbers which can't be stored with full $16$ digits of precision, but for most of the numbers the precision will be $16$ digits? Or does it just mean that we must use some particular rounding method to always come up with $16$ digits of precision? Or maybe something other?

3

There are 3 best solutions below

0
On BEST ANSWER

Consider simpler example: 6-bit floating-point number. Its precision would be about $1.8$ decimal digits. This actually means that if we try to represent all the numbers possible for full 2-decimal-digit number, we'll get wrong results after rounding. For example, $8.2$ can't be represented quasi-exactly in such number: it has binary expansion as $\approx1000.001_2$, and rounding in any direction to get 6-bit value will give either $1000.00_2=8.0$ or $1000.01_2=8.25\to8.3$.

Thus the answer is that fractional number of digits of precision means that not all numbers can be represented in such a way that after rounding the floating-point representation of a number to the expected decimal digit we'd get the original number.

1
On

It's worse. The exact values possible with 53 bit binary numbers differ from the exact values possible with 16 digit decimals: If you convert a 53 bit number you may end up with 35 decimals! So we agree that in both cases intervals of numbers are represented, e.g. that $3.1415926$ stands for $3.1415926\pm5\cdot 10^{-8}$ and during conversion we want to find a decimal interval that best matches a given binary interval (or vice versa). By this, if we only know that the "true" value of a number is within a certain binary interval, we may end up with a decimal representative that is right for the representing value, but not for the "true" value, i.e. we accumulate two rounding errors. But what we have anyways is that there are many more intervals on the binary side than on the decimal if we use 15 decimals, hence usually almost ten binary representatins fall into the same decimal interval; on the other hand there are slightly less binary representations than there are 16 digit decimals, hence any conversion routine will leave out some possible 16 digit numbers (by the above, it may still be nontrivial to pick the right 16 digit decimal in all other cases)

3
On

Since $10 = 5 \cdot 2$ and 2 and 5 are prime we can never be sure how good decimal numbers will be represented using the ordinary IEEE float standards.

However, every 10 bits gets "quite close" to 3 decimals since 1024 is almost 1000. Actually this fact can be used to store "binary-coded-decimal": To accept to lose 24 of the 1024 to store exactly 3 decimal digits. Even less efficient would be to use 4 bits to code one decimal digit at a time 0−9, losing the 6 combinations up to 24=16.