product of orders of magnitude

108 Views Asked by At

I have been given quite a theoretical question.

"What are the problems you may encounter when taking the product of 10 orders of differing orders of magnitude. What approach might you take to help ensure the numerical stability and accuracy of the product?"

I believe this is something to do with a rounding error that will get amplified each time there is a multiplication.

What can I do to solve this problem?

Am I on the right track or can anybody point me in the right direction?

Thanks

3

There are 3 best solutions below

0
On

If you use a fixed-point representation, there is a serious danger of overflow or underflow. And if there are small values that do not cause underflow, they can cause severe loss of significant digits anyway.

The cure is to use a floating-point representation with a normalized mantissa, which will make the computation completely unsensitve to the orders of magnitude.

1
On

I read the question as being that you only know the orders of magnitude as integers, and you are trying to find the order of magnitude of the product of the original numbers.

If your definition of "order of magnitude" of some positive real $x$ is the integer $m$ with $10^m \le x \lt 10^{m+1}$, then I would think that your problem is the implicit rounding in this process. Here $m$ is the characteristic of the base-$10$ common logarithm of $x$

For example $10^{2m} \le x^2 \lt 10^{2m+2}$ and $x^2$ will have order of magnitude $2m$ or $2m+1$. Without knowing $x$, you will not know from $m$ whether to choose $2m$ or $2m+1$.

You will face a similar issue with the product of $n \ge 2$ numbers knowing only their orders of magnitude as there will be $n$ possible answers ranging from the sum of the original orders of magnitude to that sum plus $n$

  • With a product of an odd number $n$ of them, there is a sensible estimate since you can pick the middle possibility: so add up the orders of magnitude, and then add $\left\lfloor \frac n2 \right\rfloor$ to get a sensible answer

  • With a product of an even number $n$ of them (in this question $n=10$), the equivalent approach would need an arbitrary choice, and neither is obviously better than the other: so add up the orders of magnitude, and then either add $\frac n2 -1$ or add $\frac n2$ to get a fairly sensible answer

0
On

I will assume that you are using double precision floating point arithmetic which complies with the IEEE 754 standard. I will use the following properties. Every normal DP number can be written as $$ x = f \cdot 2^m, \quad, \quad $$ where the significand $f$ satisfies $$ f \in [1,2)$$ and the exponent $m$ satisfies $$ m \in \{-1022,1023\}. $$ The smallest subnormal DP number is $$ y = 2^{-1074}$$ and any result $z$ such that $|z| \leq y/2$ is rounded to zero using the standard rounding mode.

Given a sequence of numbers $\{x_j\}_{j=1}^n$ your task is to device a stable algorithm for computing the product $$p = \prod_{j=1}^n x_j.$$ Let us first consider the natural algorithm, i.e., $$p_1 = x_1, \quad p_j = p_{j-1} x_j, \quad j = 2, 3, \dotsc, n.$$ This algorithm is extremely vulnerable to floating point overflow/underflow. Consider the case of $$x_1 = x_2 = 2^{-600}, \quad x_3 = x_4 = 2^{600}.$$ It is clear that $$x_1x_2x_3x_4 = 1.$$ However, $$ p_2 = x_1x_2 = 2^{-1200}$$ which is flushed to zero, i.e., the computed value $\hat{p}_2$ of $p_2$ is $$\hat{p}_2 = 0.$$ We mention in passing that the relative error for $\hat{p}_2$ is $1$. Such a large relative error is unacceptable. If we instead attempt to compute $$q = x_3x_4 = 2^{1200},$$ then the computed result is $$\hat{q} = \infty,$$ because the calculation overflows. Naturally, we can be clever and compute $(x_1 x_3)(x_2 x_4)$ for which the computed value will be exact, but such cleverness is difficult to implement in general and is only possible when all terms available from the start.

Instead we exploit the fact that the numbers $x_j$ given as $$x_j = f_j \cdot 2^{m_j}, \quad f_j \in [1,2), \quad m_j \in \{-1022,1023\}.$$ I will now show how to obtain write $$p_j = g_j \cdot 2^{k_j}, \quad g_j \in [1,2), \quad k_j \in \mathbb{Z}.$$ I will proceed inductively. It is clear that $$g_1 = f_1, \quad k_j = m_j.$$ Moreover, if $$p_{j-1} = g_{j-1} \cdot 2^{k_{j-1}}, \quad g_{j-1} \in [1,2), \quad k_{j-1} \in \mathbb{Z},$$ then $$ p_j = g_{j-1} f_j \cdot 2^{k_{j-1} + m_j}.$$ We must now obtain expression for $g_j$ and $k_j$. By assumption $g_{j-1} f_j \in [1,4)$, hence $$g_{j-1} f_j = h \cdot 2^{r}, $$ where $$h \in [1,2), \quad r \in \{0,1\}.$$ We see that the choice of $g_j = h$ and $k_j = k_{j-1} + m_j + r$ ensures $$p_j = g_j \cdot 2^{k_j}.$$ This new algorithm will allow you compute the significand for the product as well as the correct exponent. Obviously, the final result is not necessarily in the representational range, but that is not your problem. It is possible to show that $g_j$ is computed with a relative error which is no worse that $$\gamma_{j-1} = \frac{(j-1) u}{1-(j-1)u} \approx (j-1) u$$ where $u = 2^{-53}$ is the DP unit roundoff. This proof is either easy or hard depending on which technique your class uses to analyse rounding errors.


In Matlab you can use the function log2 to isolate the exponent and the significand, but mind the fact that $$[f, e]=\text{log2}(x)$$ returns $0.5 \leq f \leq 1$. In C there is in math.h since C99 a function frexp which will do the same job.