Converting 0.1 to binary 64 bit double

1.7k Views Asked by At

I want to convert the decimal number 0.1 to binary 64 bit double. So I do it like that:

$$ 0.1_{10} = 0.00011001100110011001100110011001100110011001100110011001100110... \times 2^0 $$

Represent it in the scientific form:

$$ 1.1001100110011001100110011001100110011001100110011001100110... \times 2^{-4} $$

Now 64 bit IEEE754 float allows 52 bits for mantissa, so I need to round the number to 52 bits.

$$ 1.\underbrace{1001100110011001100110011001100110011001100110011001}_{52 bits}100110... \times 2^{-4} $$

So I have to round to either:

smaller number (truncated)

$$ 1.1001100110011001100110011001100110011001100110011001 $$

larger number (original number plus 1)

$$ 1.1001100110011001100110011001100110011001100110011010 $$

Since the 53 bit is 1, I'm rounding up to the larger number. So I have mantissa part ready. Then I'm calculating biased exponent (11 bits for the exponent):

$$ 2^{11-1} -1 = 1023\\ 1023-4=1019\\ 1019_{10} = 1111111011_2 $$

So the final representation should be: $$ \underbrace{0}_{sign}\underbrace{01111111011}_{exponent}\underbrace{1001100110011001100110011001100110011001100110011010}_{mantissa} $$

Is this correct?

1

There are 1 best solutions below

3
On

Short C code:

double x = 0.1;
long long n = *(long long*)&x;
printf("%llX",n);

Gives 3FB999999999999A, which is equivalent to:

0011 1111 1011 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010

For the record, due to the strict aliasing rule, I cannot recommend this programming method.