binary floating point perform substraction and addition

60 Views Asked by Bumbble Comm At 10 May 2026 - 7:01

if $x=1.0e38=1.0 * 10^{38}$ and $y=3.0$
i want to find $ (x-x)+y $ and $(x+y)-x$
i think the value of (x-x)+y will be just substract $x-x=0 + y=3.0 = 3.0$
but how can i perfom addition of different base? $(x+y)-x$
i think the idea is addition $(x+y)$ then substract $-x$ using floating point, i tried to convert $y=3.0$ to binary such as $1.1 * 2^1$
but how about $10^{38}$ to binary ?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 31 Jul 2019 - 2:05 BEST ANSWER

You have to define what floating point format you are using. Standard IEEE $64-$bit floating point assigns $53$ bits to the mantissa, giving a precision of about $16$ decimal digits. $10^{38}$ will be represented by some $53$ bit mantissa times a binary exponent. To add to that, you have to match the exponents, and $3.0$ will be shifted far to the right, which means in floating point $10^{38}+3.0=10^{38}$ exactly and when you subtract off $10^{38}$ you will get $0$. As long as the larger number is at least $10^{16}$ times the smaller this will happen.

Bumbble Comm On 31 Jul 2019 - 1:30

I think it is a moot point. Unless you are willing to use an 'unlimited precision' software package, for any floating-point system with a reasonable number of digits in the mantissa (in any base you like), the digits of '$3.0$' will be shifted so far to the right of the radix point of '$1.0*10^{38}$' as to be dropped completely, hence the calculated value of '$(x+y)-x$' will end up as zero.

binary floating point perform substraction and addition

There are 2 best solutions below

Related Questions in ARITHMETIC

Related Questions in BINARY-OPERATIONS

Related Questions in FLOATING-POINT

Related Questions in BINARY-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions