convert number to floating point number IEEE 754

778 Views Asked by At

I need to convert $-10.625$ to an IEEE 754 floating point number (32-Bit).

So I have set the sign bit to $1$ to make sure it will be a negative number.

Then I have converted $(10)_{10}$ to $(00001010)_2$.

After that I have converted the $0.625$ to binary:

$$0.625 \cdot 2 = 1.250 \rightarrow 1$$ $$0.250 \cdot 2 = 0.50 \rightarrow 0$$ $$0.5 \cdot 2 = 1\rightarrow 1$$

So my fixed comma number is $00001010.101$. So to have a one at the beginning I have set the exponent to $4$. Finally $4 + bias \Rightarrow 4 + 127 = (131)_{10} = (10000011)_2$.

My resulting floating point number is: $1 \quad 10000011 \quad 1010101 $.

Unfortunately it is wrong.

Question: Where is my mistake?

1

There are 1 best solutions below

0
On BEST ANSWER

You have at least three problems there.

First, the exponent needed to make $1010.101_2$ into $1.010101_2$ is $3$, not $4$. Numbers with an exponent (before bias) of $0$ are in the interval $[1,2)$, not $[\frac12,1)$.

Second, since the bit in the mantissa before the decimal point is always 1, it would be a waste of space to represent it in the single-precision float, so it is left implicit in the representation (with a special representation trick for zero and denormals).

Finally, you're missing the last 16 bits of the mantissa.

So you should have gotten 1 10000010 01010100000000000000000 (or hex C12A0000) instead.


The computer agrees:

$ cat > foo.c
#include <stdio.h>
int main() {
  float s = -10.625 ;
  printf("%x\n",*(int*)&s);
}
$ gcc -O0 foo.c
$ ./a.out
c12a0000
$