How accurate is numerical differentiation for high-order derivatives?

310 Views Asked by At

Suppose you have a discrete probability distribution where the mass function $p$ is complicated and hard to compute, but the probability generating function $G$ is simple and easy to compute. It is well-known that these two functions are related by:

$$p(k) = \frac{1}{k!} G^{(k)}(0).$$

Now, suppose you want to compute the mass function for some reasonably large value of $k$, but it is too difficult to get an analytic expression for the $k$th derivative of the probability generating function. An obvious option is to fall back on numerical differentiation. But how accurate is this, given that you need to compute a high-order derivative? Given that there is some error in each step of the numerical differentiation, for large $k$, I imagine that this would blow the accuracy to pieces.

This kind of situation tends to occur when you are dealing with large sums of random variables, in cases where you can't rely on the central limit theorem. For example, suppose you have a random variable $K = X_1 + \cdots + X_n$ composed of $n$ independent (discrete) random variables. In this case the mass function $p$ is a convolution of $n$ mass functions, so it is likely to be complicated and hard to compute. However, the probability generating function is:

$$G(z) = \prod_{i=1}^n G_i(z).$$

Assuming that the individual generating functions $G_1,...,G_n$ are simple to compute, it will be simple to compute their product, and to it is simple to compute $G$. Taking high-order derivatives becomes cumbersome, since the generating function is a product of a large number of functions.


Question: Generally speaking, using common mathematical/statistical software, how accurate is numerical differentiation for these kinds of high-order derivatives? Are there any "order-based" results that would let me know if it is feasible to compute the mass values by this method? How big could $k$ be before numerical differentiation would give a bad answer to this problem?

(Note: I am well aware that this answer depends on a range of factors, including the form of $G$ and the numerical procedure used, including the number of digits stored at each step. In view of this, I am looking for any useful information on the feasibility of this computation method in realistic situations; feel free to fill in the blanks with any assumptions you need, as you see fit.)