Numerical stability of a geometric series

494 Views Asked by At

Suppose I want to compute $S_n = \sum_{i = 0}^n \beta^i$. We do have an exact formula for the answer: $S_n = \frac{1 - \beta^{n + 1}}{1 - \beta}$. Let's suppose though that instead of using this formula I computed $S_n$ via recursion, using $S_n = 1 + \beta S_{n - 1}$ ($S_0 = 1$). Numerically, does this make a difference? Is the latter approach less stable or more stable if I were to write a computer program to do this?

2

There are 2 best solutions below

2
On BEST ANSWER

Let $n$ be a positive integer and let $f_n : \mathbb R \rightarrow \mathbb R$ be given by $$f_n(x) = \sum_{j=0}^{n-1} x^n = \begin{cases} n, & x = 1 \\ \frac{1-x^n}{1-x}, & x \not =1 \end{cases} $$ denote the sum of the first $n$ terms of the geometric series. We want to compute $f_n$ accurately.

We begin by considering the conditioning of $f = f_n$. We note that $f$ is never zero. The condition number of $f$ at the point $x \not = 0$ is given by $$ \kappa_f(x) = \left | \frac{xf'(x)}{f(x)} \right| = \left| \frac{nx^n(x-1)-x(x^n-1)}{(x^n-1)(x-1)} \right|.$$ We conclude, that $f$ is ill conditioned, when $x$ is close to $1$. We can not expect to compute $f(x)$ with a small relative error for a real $x \approx 1$, unless we use a suitably small unit roundoff $u$.

We now turn towards the problem of computing $f(x)$, where $x$ is a floating point number, rather than a general real number. I will use the notation $\langle k \rangle$ to denote a term of the form $$1+\theta,$$ where $$|\theta| \leq \gamma_k = \frac{ku}{1-u}.$$ With this notation we have $$ \langle k_1 \rangle \cdot \langle k_2 \rangle = \langle k_1 + k_2 \rangle$$ Basic arithmetic operations, such as addition, satisfy $$ \text{fl}(x + y) = (x + y) \langle 1 \rangle$$ but this (standard) model of floating point computations can also be written as $$ \langle 1 \rangle \text{fl}(x+y) = (x+y)$$ and I will use either form without further comment.

We will use Horner's method to evaluate all relevant polynomials. If $$ p(x) = \sum_{j=0}^m a_j x^j,$$ then Horner's method is $$ p_0(x) = a_m, \quad p_k(x) = x p_{k-1}(x) + a_{m-k}, \quad k = 1,2,\dotsc,m,$$ and the return value is $y = p_m(x)$. In exact arithmetic $$y = p(x).$$ If $mu<1$ and in the absence of floating point exceptions, the computed value satisfies $$ \hat{y} = a_m \langle 2m \rangle x^m + \sum_{j=0}^{m-1} \langle 2j+1 \rangle a_j x^j.$$ It follows, that the forward error satisfies $$ |y - \hat{y}| \leq \gamma_{2m} \sum_{j=0}^m |a_j| |x|^j.$$

If we apply Horner's method to the polynomial $$p(x) = \sum_{j=0}^{n-1} x^j,$$ then $m=n-1$ and the coefficients, $a_j = 1$, are all positive. It follows, that the forward relative error, satisfies $$ \frac{|y - \hat{y}|}{|y|} \leq \gamma_{2n-2}, \quad x \ge 0.$$ In short, we are sure to have a small forward relative error for $x \ge 0$. We have no such guarantee for $x<0$.

Horner's method also applies to the polynomial $q$ given by $$ q(x) = x^n - 1. $$ However, the general analysis is a bit pessimistic, because most of the coefficients are zero. With $$q_0(x) = x, \quad q_k(x) = x q_{k-1}(x), \quad k=1,2,\dots,n-1$$ and returning $y = q_{n-1}(x) - 1$. The computed values $$\hat{q}_0 = 1, \quad \hat{q}_k = x \hat{q}_{k-1} \langle 1 \rangle, \quad \hat{y} = (\hat{q}_{n-1} - 1) \langle 1 \rangle.$$ It follows, that $$\hat{q}_{n-1} = x^n \langle n-1 \rangle, \quad \hat{y} = x^n \langle n \rangle - \langle 1 \rangle.$$ In reality, we are interested in $$ r(x) = \frac{x^n - 1}{x-1},$$ which we evaluate as $$r(x) = \frac{q(x)}{x-1}.$$ Let $\hat{z}$ denote the computed value, then $$ \hat{z} = \frac{\hat{q}}{x-1} \langle 2 \rangle = \frac{x^n \langle n+2 \rangle - 1 \langle 3 \rangle}{x-1}$$ It follows, that the forward error satisfies $$ |z - \hat{z}| \leq \gamma_{n+2} \frac{|x|^n + 1}{|x-1|}. $$ The forward relative error satifies $$ \frac{|z - \hat{z}|}{|z|} \leq \gamma_{n+2} \frac{|x|^n+1}{|x^n-1|}$$ Obviously, we have a small forward relative error for $x \ge 0$, and the bound is better by a factor which is roughly $2$, i.e. compare $\gamma_{2n-2}$ to $\gamma_{n+2}$. However, this improvement is insignificant, compared with the fact that there relative error is also small for large negative values of $x$! The right hand side of the bound is only large when $x \approx 1$, i.e. when the problem of computing $f_n$ is ill-conditioned.

Conclusion: Of the two alternatives surveyed, we should compute $f_n(x)$ using the formula $$ f_n(x) = \begin{cases} n, & x = 1 \\ \frac{x^n-1}{x-1}, & x \not =1 \end{cases}. $$ If the polynomial $q(x) = x^n-1$ is evaluated using Horner's method, then we have a small relative forward error for $x \ge 0$. We also have small relative forward error for $x \ll 0$.

I have suppressed several details and small calculations. I recommend Higham's book "Accuracy and stability of numerical algorithms" which uses $\gamma$-factors and also the compact notation $\langle k \rangle$.

There are at least two other ways of computing $q$, and the analysis of Horner's method changes dramatically if the computer uses fused-multiply add instructions. These are all excellent questions for another day.

0
On

The first formula you use as $\beta \rightarrow 1$. Indeed if $|\beta-1|<\varepsilon$, where $\varepsilon$ is the precision of your machine, then $S_n$ will be equal to NaN... On the other side, with the recursion, your computer will just compute $S_{i+1}=1+S_i$, giving you $S_n=n-1$. That is not the good result, but it is still better than NaN.

If you know that $\beta$ is not too close to 1 then just use your first formula, it is way more cheaper to compute.