Treatment of Floating Point Rounding in Trefethen & Bau

Question

Treatment of Floating Point Rounding in Trefethen & Bau

323 Views Asked by Bumbble Comm At 10 May 2026 - 6:05

Something I noticed in the Trefethen & Bau Numerical Linear Algebra book is that, after introducing elementary floating point arithmetic, they do not pay too much care to the initial rounding of the inputs. For example, they initially introduce the numerical approximation of addition of two real numbers as $$ \tilde{f}(x_1,x_2) = {\rm fl}(x_1) \oplus{\rm fl}(x_2), $$ where ${\rm fl}(\cdot)$ acts to round a real number to a nearest floating point number. However, when it comes to the discussion of backwards stability of back substitution, the ${\rm fl}(\cdot)$ completely disappears from the discussion:

Is this omission purely for clarity of the exposition, or is there a real reason fro dropping it? I noticed that N. Higham's book, similarly, seems to drop this operation.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I will use an example to explain why the input is normally treated as error free.

Consider the very general problem of computing $f(x)$ where $f$ is a real function of a single real variable $x$. In general, there are three sources of error.

We do not have access to the exact value of $x$, but most use an approximation $\hat{x}$. If $x$ is the result of a long complex computation, then $\hat{x}$ is not necessarily a good approximation of $x$. In general, the best we can hope for is to obtain the floating point approximation of $x$, i.e., $\hat{x} = \text{fl}(x)$. The impact of this initial rounding can be assessed using the condition number of the function $f$ at the point $x$. Specifically, we have $$\left| \frac{f(x) - f(\hat{x})}{f(x)} \right| \approx \kappa_f(x) \left| \frac{x - \hat{x}}{x} \right|, \quad \kappa_f(x) = \left| \frac{xf'(x)}{f(x)} \right |$$.
The formula for $f$ is too complex, so we use an approximation, say, a piecewise polynomial $p$. We use our mathematical skill to construct $p$ such that $f(\hat{x}) - p(\hat{x})$ is suitably small.
Rounding errors prevent us from computing $y = p(\hat{x})$ exactly. We analyze the algorithm used to evaluate $p$ to bound the rounding error, i.e., the difference between $y = p(\hat{x})$ and the computed value $\hat{y}$ of $y$.

The total error is $E = f(x) - \hat{y}$. It can be analyzed by writing it as a sum of the three errors listed above. Specifically, $$E = \left(f(x) - f(\hat{x})\right) + \left(f(\hat{x}) - p(\hat{x})\right) + \left(p(\hat{x}) - \hat{y}\right).$$ The three contributions to the error have very distinct qualities. The size of the first term depends on the properties of the function $f$ and the quality of $\hat{x}$ as an approximation of $x$. If our responsibility is limited to implementing the function $f$, then this is truly beyond our control. The size of the second term hinges on our ability to choose a good approximation. This is a question of approximation theory. The size of the third term depends on our understanding of finite precision arithmetic.

The typical textbook on numerical analysis will mention point $1$ briefly, will discuss $2$ in the context of, say, Lagrange interpolation and will concentrate on point $3$ when analyzing specific algorithms. This is not a bad strategy, but I have found it useful to periodically stress that the total error is really a sum of three contributions.

When analyzing the backward stability of an algorithm, we are interested in the algorithm's ability to solve the problem which is actually feed into the machine. It is not the fault of the algorithm, if this problem is far from the original problem. This is why it is proper to ignore the error on the input when discussing the backward stability of an algorithm.

Treatment of Floating Point Rounding in Trefethen & Bau

There are 1 best solutions below

Related Questions in NUMERICAL-METHODS

Related Questions in NUMERICAL-LINEAR-ALGEBRA

Related Questions in FLOATING-POINT

Trending Questions

Popular # Hahtags

Popular Questions