Step size $h$ in the incremental ratio approximation of the derivative

1.8k Views Asked by At

Why is $$h = \epsilon \cdot(1 + |x|)$$ a good step size in approximating numerically the derivative of a function $f:\mathbb{R} \to \mathbb{R}$ with $$\frac{f(x+h)-f(x)}{h}?$$

1

There are 1 best solutions below

3
On BEST ANSWER

The premise of the question is incorrect: The choice of $h = \epsilon(1+|x|)$ is much too small.

There are two sources of error in approximating a derivative. The first comes from calculating it as a difference quotient, and it is easily seen that the error from this is \begin{align*} \left| \frac{f(x+h) - f(x)}{h} - f'(x) \right| \le \frac{h}{2}|f''(x)| \end{align*} However, we cannot calculate $f(x)$ and $f(x+h)$, we calculate some representable numbers close to them (call them $\hat{f}(x+h)$ and $\hat{f}(x)$). For floating point arithmetic with a guard digit we know that $\hat{f}(x+h) = f(x+h)(1+\epsilon_{1})$ and $\hat{f}(x) = f(x)(1+\epsilon_{2})$, where $|\epsilon_1|, |\epsilon_2| \le \epsilon$, where $\epsilon$ is the unit roundoff.1 So the total error is \begin{align*} \left| \frac{\hat{f}(x+h) - \hat{f}(x)}{h} - f'(x) \right| &= \left| \frac{f(x+h)(1+\epsilon_1) - f(x)(1+\epsilon_2)}{h} - f'(x) \right| \\ &\le\frac{h}{2}|f''(x)| + \frac{1}{h}(|f(x+h)|+|f(x)|)\epsilon =: g(h) \end{align*} We can see that taking $h$ as the smallest representable is likely to lead to castatrophe as the error from the roundoff of the function evaluation becomes unbounded. Instead we want to minimize the maximum error $g(h)$. You can differentiate $g$ and set it to zero to discover that the choice of $h$ that minimizes the maximum error is \begin{align*} \sqrt{\frac{2(|f(x+h)| + |f(x)|)\epsilon}{|f''(x)|}} \end{align*} For this choice of $h$, $g(h) = \sqrt{\epsilon}(|f''(x)| + (|f(x+h)| + |f(x)|)/2)$. Since we do not know $f''(x)$ (and cannot compute it for less than the cost of using a higher order method) then we can choose \begin{align*} h = 2\sqrt{\epsilon} \end{align*} This choice preserves the $O(\sqrt{\epsilon})$ error estimate irrespective of the ratio $(|f(x+h)| + |f(x)|)/|f''(x)|$.

There is a pathological case: If $x$ is so large that $x + 2\sqrt{\epsilon} == x$, then our choice of $h$ calculates the derivative to be precisely zero. Then the correct choice of $h$ is h = float_next(x) - x, which is roughly $|x|\epsilon$. However, in this regime $|x|\epsilon \gg 2\sqrt{\epsilon}$. The error is then dominated by the fact that the distance between floats increases as $|x|$ increases.

All the previous analysis depends on $x+h$ being representable. If $x+h$ is not representable, then you can show that the maximum error is \begin{align*} \frac{h}{2}|f''(x)| + \frac{1}{h}(|f(x+h)|+|f(x)| + |xf'(x)|)\epsilon \end{align*} but this does not change the conclusion that we should choose $h \sim O(\sqrt{\epsilon})$.

As to choosing $h = \epsilon(1+|x|)$, we can see that the error of this choice is \begin{align*} \epsilon(1+|x|)|f''(x)|/2 + \frac{|f(x+h)| + |f(x)|}{1+|x|} \approx \frac{2|f(x)|}{1+|x|} \end{align*} So the error is enormous.

1 Note that this assumes that the algorithm to compute our function produces the best representable estimate for the given precision. For C standard library functions there are sometimes guarantees that this is the case but most special function libraries do not compute to this accuracy as the compute time to extract the last few digits of precision represent the bulk of the computation time. Hence we should assume that $\hat{f}(x) \approx f(x)(1+k\epsilon)$ for some $k \in \mathbb{N}$.