Consider that I want to do a binary operation on the following floating point numbers: 0.35-0.62
I can reach the end but I can not figure out how the sign bit is determined.
1) first we write the numbers in binary. Assume that we can represent 4 digits in the fraction part
0.35 -> 0.010110... -> 1.0110 * 2^(-2)
0.62 -> 0.1001111.... -> 1.0011 * 2^(-1)
2) We have to modify the smaller exponent to make it equal to the larger one. So:
1.0011 * 2^(-1) = 0.1011 * 2^(-1)
3) Subtract the mantises:
0.1011
1.0011-
----------
{1} 1.1000
{1}
means there is a carry bit. The new mantis in 1.1000
. So how we determine the sign bit? We can do that operation using 2's complement like this:
0.1011
0.1101 +
------------
{0} 1.1000
The carry bit is zero. So is that a positive number or negative one? How the sign is determined?
4) Assuming that we know the sign is negative! the result is 1.1000 * 2^(-1)
.
Any idea on determining the sign bit?