How to calculate the errors of single and double precision

1.6k Views Asked by At

We consider the initial value problem

$$\left\{\begin{matrix} y'=y &, 0 \leq t \leq 1 \\ y(0)=1 & \end{matrix}\right.$$

We apply the Euler method with $h=\frac{1}{N}$ and huge number of steps $N$ in order to calculate the approximation $y^N$ of the value of the solution $y$ at $t^N, \ y(t^N)=y(1)=e$. At the following table there are, for all $N$, the errors $|\epsilon^N|=|e-y^N|$, when the calculations are done with single and double precision.

$$\begin{matrix} N & |\epsilon^N|\text{ Single-precision } & |\epsilon^N| \text{ Double-precision } \\ - & - & - \\ 100 & 0.13468 \cdot 10^{-1} & 0.13468 \cdot 10^{-1} \\ 200 & 0.67661 \cdot 10^{-2} & 0.67647 \cdot 10^{-2}\\ 400 & 0.33917 \cdot 10^{-2} & 0.33901 \cdot 10^{-2}\\ 800 & 0.16971 \cdot 10^{-2} & 0.16970 \cdot 10^{-2}\\ 1600 & 0.85568 \cdot 10^{-3} & 0.84898 \cdot 10^{-3} \\ \cdots & & \\ 102400 & 0.65088 \cdot 10^{-4} & 0.13273 \cdot 10^{-4} \\ 204800 & 0.21720 \cdot 10^{-3} & 0.66363 \cdot 10^{-5} \\ 409600 & 0.78464 \cdot 10^{-3} & 0.33181 \cdot 10^{-5} \\ 819200 & 0.20955 \cdot 10^{-2} & 0.16590 \cdot 10^{-5} \\ \dots \end{matrix}$$

We notice that the errors of the calculations of double-precision get approximately half. However, in the case of single-precision, for $N>10^5$ the errors increase! Indeed, for a big enough $N$, the errors in our case tend to $1.71828 \dots$.

Could you explain me why the errors, when the calculations are done in single-precision, increase for $N>10^5$ and why they get approximately half when the calculations are done in double-precision?

Also, how can we calculate the error for a given $N$? For example, if we have $N=10^5$ then $\epsilon^N=|e-y^{10^5}|=\left |e- \left( 1+ \frac{1}{10^5} \right)^{10^5} \right |$. How can we calculate the latter, knowing that the zero of the machine is $10^{-6}$ when we have single precision but $10^{-12}$ when we have double precision?

EDIT: It holds that: $$\ln{\left( 1+ \frac{1}{N}\right)}=\sum_{n=1}^{\infty} (-1)^{n+1} \frac{\left( \frac{1}{N}\right)^n}{n}=\frac{1}{N}- \frac{1}{2N^2}+O(N^{-3})$$ Right? If so, then $N \ln{\left( 1+ \frac{1}{N}\right)}=1-\frac{1}{2N}+O(N^{-2})$, right? $$$$ If so, then how can we find the difference of the real solution with the approximation when we take into consideration that we have single precision and how when we take into consideration that we have double precision?

2

There are 2 best solutions below

5
On

When $N$ gets large, the size of each step, and thus the size of the change in function value, gets small. You start out with $y(0)=1$, and then you get $$ y(h)=1+\epsilon $$ and if $\epsilon$ is small enough, the computer won't be able to handle the difference between $1$ and $1+\epsilon$ with very good precision. This is the source of the increasing error. Since double precision handles this problem better, you get less error.

As for why the error tends to $1.71828\ldots$, if $\epsilon$ is really small, the computer thinks that $y$ doesn't change at all from time step to time step, and therefore thinks that $y$ is a constant function. You're supposed to get $e$ as the final value, so the error is therefore $e-1$.

9
On

The Euler method has local discretization error order 2 and global error order 1. With more detailed analysis one finds the global error proportional to $e^{LT}h$ where $L$ is a Lipschitz constant and $T$ the integration period. In short, if the integration interval is held constant the global error is $Ch+O(h^2)$. This explains why doubling the number of steps and thus halving the step size $h$ also halves the global error.

Each integration step has a random floating point error that is a small multiple $Dμ$ of the machine precision $μ$ for the floating point type. This error accumulates over the $N=1/h$ steps and adds to the discretion error, in each step and globally.

In total this gives an expression for the computational global error like $$ Ch+NDμ=Ch+\frac{TD·μ}h=2\sqrt{CDT·μ}+\frac{C}{h}·\left(h-\sqrt{\frac{DT·μ}{C}}\right)^2 $$ This is a convex function in $h$ that has a valley at approximately (disregarding the constants) $h≈\sqrt{μ}$ with an error of about the same magnitude $\sqrtμ$.

  • For the float type $μ≈ 10^{-7}$ and thus the minimal error is attained for approximately $h≈ 10^{-4}$ to $10^{-3}$.
  • For the double type, $μ≈10^{-15}$, so the valley is to be found at step sizes $h≈ 10^{-8}$ to $10^{-7}$.

Thus if you were to continue the table, you would see growing errors also for the double type after reaching about $10^{-7}$ as lowest error. As test, for $h=10^{-9}$ or $N=10^9\simeq 2^{23}·100$ the error should be above $10^{-6}$.


The error progression of the Euler method can be compared to compound interest rate computations where the interest rate is $Lh$ per integration step. This is a consequence of the Gronwall lemma. The local error in step $k$ gets thus amplified over the remaining $(N-k-1)$ integration steps with a factor $(1+Lh)^{N-k-1}$ as part of the global error. The local error is the sum of the discretization error bound by $c·h^2$ and the floating point error bound by $d·μ$ where $c$ contains the magnitude of the system function $f$ and $d$ the magnitude of its derivative resp. the complexity of the evaluation of $f$. Thus the estimate of the global error has the form \begin{align} (c·h^2+d·μ)·\sum_{k=0}^{N-1}(1+Lh)^{N-k-1} &=(c·h^2+d·μ)·\frac{(1+Lh)^N-1}{Lh} \\ &\simeq \left(\frac{c·h}L+\frac{d·μ}{Lh}\right)·(e^{LT}-1) \end{align}