trouble understanding floating point representation

150 Views Asked by At

I had a quiz last week on floating point representation. After he graded the quiz, he walked us through each step so that we could see what we did wrong. I took notes so that I could study his procedure but no matter how much I try, I can't get the same result as he did. He said the answer is $100.125$.

Use a simple $16$-bit format for floating-point representation with $5$ bits for the exponent (with a bias of $15$), a normalized mantissa of $10$ bits, and a single bit for the sign. Show the decimal value represented by the computer as $0~01000~1001000010$

exponent = $5$

sign = $1$

mantissa = $10$

excess - m = $8$

bias = $15$

answer: $1001000.010 \cdot 2^{-7} = 100.125$

Can anyone explain how the answer is $100.125$?

1

There are 1 best solutions below

3
On

This is a standard floating-point format called half-precision.


Let's invert the question and ask: How do you represent the number $100.125$ in this format?

The integer part ($100$), has the binary representation $1100100_2$. The fractional part ($.125$) is $\frac{1}{8} = 2^{-3}$, which is simply $.001_2$. Putting it together:

$$100.125_{10} = 1100100.001_2$$

But (most) floating-point numbers need to “normalized”, into scientific notation with exactly one digit to the left of the radix point, so:

$$100.125_{10} = 1.100100001_2 \times 2^6$$

This number conveniently has exactly ten significant bits. So, you might think that we could straightforwardly encode the significand as 1100100001.

But there's a trick here: The first digit of any “normalized” binary number (except zero) is always 1. And the IEEE people decided that it would be a waste of memory to explicitly store a bit that's always 1. So instead, the format is defined to make the leading 1 implied, and use the thus-saved 1 bit of memory to add one bit of precision to the significand.

$$100.125_{10} = (1).1001000010_2 \times 2^6$$

So the significand is thus represented in-memory as the bits 1001000010.

Next, let's deal with the exponent. It's 6, but since the format has an exponent bias of 15, we represent as 6+15 = 21. Or in binary, 10101.

Finally, since the number is positive, the sign bit is 0.

Putting this all together, the half-precision representation of $100.125$ is 0 10101 1001000010. (Or in the more compact hexadecimal representation, 5642.)


OK, so now let's consider the actual bit pattern given:

0 01000 1001000010

  • Sign bit 0 = +
  • Exponent 01000 = 8. Subtracting the bias of 15 gives an actual exponent of -7.
  • Significand 1001000010 = $(1).1001000010_2 = 1 + 2^{-1} + 2^{-4} + 2^{-9} = \frac{801}{512}$. Gee, this looks familiar.

Therefore, the number has a value of $+\frac{801}{512} \times 2^{-7} = \frac{801}{65536} = 0.0122222900390625$.

Note that this value is not equal to $100.125$. It is, in fact, exactly $\frac{1}{2^{13}}$ of $100.125$.

So it seems that your instructor is wrong. They got the significand correct (including the hidden 1 trick), but messed up the exponent somehow.