Need help for a beginner to floating-point arithmetic

Question

Need help for a beginner to floating-point arithmetic

396 Views Asked by Bumbble Comm At 27 Mar 2026 - 12:00

I have this question that I need to complete and I literally have zero idea on what to do. I basically need someone to talk me through it and will appreciate all the help I can get.

The question states:

Consider a "toy system" of floating-point arithmetic such that every number is of the form:

$x=(-1)^s * (1+m) * 2^{e - \delta}$

The mantissa $0 ≤ m < 1$ is a number whose binary representation is of the form:

$m=0 . m_1 m_2 m_3$ (base 2)

Where $m_1$, $m_2$, $m_3$ are either $0$ or $1$. Also, $s$ is either $0$ to $1$. The exponent $e$ is an integer such that $1 ≤ e ≤ 6$ and $\delta = 3$ is the shift.

What is the largest positive number?
What is the smallest positive number?
What is the machine epsilon?
List all the possible values the mantissa can take.
How many floating-point numbers are in this set?

Express all the results as decimal numbers (base $10$).

Now I literally have no idea what this is really on about. So guidance and some help would be really wonderful. Thanks for your time.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2013-10-21 15:24:45

The notion of machine epsilon in floating-point arithmetic is useful in getting upper bounds on the contribution a rounding error can make in a single operation.

The task here is simply applying the definition. Machine epsilon is the smallest machine-representable positive value we can obtain by subtracting $1$ from a floating-point value slightly greater than $1$. Intuitively we are apt to think of this as a "unit in the last place" of mantissas normalized to exponent $0$, but as we will see the range of exponents can affect this.

For reference let's begin with a representation of $1$ in our "toy" binary arithmetic scheme:

$$ +(1.000)_2 \times 2^0 = 1.0 $$

Note that net exponent is zero, which means (taking the bias $\delta = 3$ into account) that $e = 3$ (and, of course, sign bit $s=0$ since the number $1$ is positive).

The first thing we would think about is the smallest number we can represent which is greater than one, i.e. by setting the mantissa bit $m_3$ from $0$ to $1$. Although we can do that, the resulting number is so close to $1$ that the exact difference (subtracting one) is not machine representable. The exact difference would be one-eighth (or 0.125 decimal), and to represent this in our binary arithmetic scheme would require $e - \delta = -3$, or $e = 0$. The format specification however says $1 \le e \le 6$, so we cannot represent that difference. Presumably in an actual calculation the difference might be rounded down to zero, but we don't have to speculate about how it would be handled.

Rather, just applying the definition, we see that the smallest positive difference that can be represented is one-quarter, corresponding to $e=1$:

$$ +(0.010)_2 = +(1.000)_2 \times 2^{1-3} $$

The limitation in the range of (offset) exponents is then a bit of trickery in this problem, giving us machine epsilon of one-quarter or 0.25 decimal.

Need help for a beginner to floating-point arithmetic

There are 1 best solutions below

Related Questions in COMPUTATIONAL-MATHEMATICS

Related Questions in FLOATING-POINT

Trending Questions

Popular # Hahtags

Popular Questions