Floating Point Arithmetics

64 Views Asked by Bumbble Comm At 27 Mar 2026 - 7:11

I have been experimenting with understanding floating-point arithmetic. I have a 64-bit processor. I have asked Matlab to use format longe, which should display a floating-point with doubt precision.

I see that $$3.16229-3.16228=1.000000000006551e-05$$ while $$.316229*10^3-.316228*10^3=9.999999999763531e-04 $$

I am not able to understand the difference. Is it because these numbers will be converted to binary representation for calculations, and then truncated to 52 bits. If yes, why the above two representations give different answers.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 02 Aug 2020 - 6:14

Yes, that is correct. The numbers can not be represented exactly in floating point, this truncation error becomes visible in the difference. As the factor 10 is not a power of 2, the binary representations in the second formula differ from those in the first one, giving different truncation errors and different errors in the subtraction.

Floating Point Arithmetics

There are 1 best solutions below

Related Questions in FLOATING-POINT

Trending Questions

Popular # Hahtags

Popular Questions