The central limit theorem in the binomal distribution case, also known as the De Moivre–Laplace theorem was historically used to approximate the binomal distribution with the normal distribution.
I noticed, that current statistics software do not use this approximation (or a better approximation like Camp-Paulson approximation). As I have seen they reduce the binomial distribution to the beta distribution. Some examples:
- Implementation of pbinom() of the statistics software R.
- Implementation of binocdf() of GNU Octave.
- Mathematica also uses the beta distribution.
This leads me to the following question: Is the approximation given by the central limit theorem for the binomial distribution still used today? Are there real world applications for this theorem? Since most statistician will use computer programs which do not implement the normal approximation I wonder whether the central limit theorem for the binomial distribution is still important...
Note: I do not want to discuss the historical importance of this theorem and I want to exclude all fields of (higher) education. I want to know whether the theorem by De Moivre and Laplace is still used outside the university...
Binomial Confidence Intervals.
Perhaps the most common practical use of the normal approximation to the binomial is to use data to find a confidence interval for the binomial success probability $\theta =$ P(Success) on any one trial. If $X$ is the number of Successes in $n$ trials, then $\theta$ is estimated as $\hat \theta = X/n.$ Assuming $\hat \theta$ to be normal with $E(\hat \theta) = \theta$ and $V(\hat \theta) = \frac{\hat \theta(1-\hat \theta)}{n},$ one has
$$P\left(-1.96 \le Z = \frac{\hat\theta - \theta}{\sqrt{\theta(1-\theta)/n}} \le 1.96\right) = 0.95,$$ where $Z$ is standard normal.
Making the additional, often unwarranted, assumption that the denominator is well approximated by $\sqrt{\hat \theta(1 - \hat\theta)/n)},$ one has the traditional (sometimes called Wald) 95% confidence interval $\hat \theta \pm 1.96 \sqrt{\hat \theta(1-\hat \theta)/n}$ for $\theta$. However, in the late 1990s it was discovered that this style of confidence interval can have actual coverage probability much different (often much less) than the promised 95% unless $n$ is very large (as, for example, in a public opinion poll). [See Brown, Cai, & DasGupta (2001).]
A more accurate CI that depends only on the normal approximation and not on the additional approximation for the denominator in the displayed relationship, is the Wilson CI. It has a more complex form (see Wikipedia). For a 95% CI it is almost as good to use the interval $\tilde \theta \pm 1.96 \sqrt{\tilde \theta(1-\tilde \theta)/\tilde n},$ where $\tilde n = n + 4$ and $\tilde \theta = (X+2)/\tilde n.$ This style of CI is sometimes called the 'plus-4' (or Agresti) interval.
Agresti, Wilson, and (for large $n$) Wald CIs are still widely recommended in respected textbooks--probably because no computer software is required. Also, for roughly $0.3 \le \hat \theta \le 0.7$ and large $n$, it turns out that the margin of error $1.96\sqrt{\hat \theta(1-\hat \theta)/n}$ or $1.96\sqrt{\tilde \theta(1-\tilde \theta)/\tilde n}$ is conveniently close to $1/\sqrt{n}.$
If software is available, intervals based on the beta distribution with very nearly the nominal coverage probability are often recommended. These are derived from a Bayesian framework with minimally informative prior distributions. But frequentists also use them (albeit with a somewhat different interpretation than Bayesians). Specifically, one can get a $(1-\alpha)$% interval estimate by cutting proabability $\alpha/2$ from both tails of $Beta(X + \kappa,\, n - X + \kappa),$ often with $\kappa = 1$ or $1/2$.
As an example, if $n = 100$ and $X = 30,$ then four 95% interval estimates for $\theta$ are $(.210, .390)$ from Wald, $(.219,.395)$ from Wilson, $(.219,.396)$ from Agresti, and $(.217,.395)$ from R code
qbeta(c(.025,.975), 30.5, 70.5).Binomial Tests.
As suggested in the Comment by @dsaxton, a closely related issue is the test of $H_0: \theta = \theta_0$ against the two-sided alternative $H_a: \theta \ne \theta_0.$ Then, assuming $H_0$ to be true, $$Z = \frac{\hat \theta - \theta_0}{\sqrt{\theta_0(1-\theta_0)/n}}$$ is approximately $Norm(0, 1).$ Consequently, $H_0$ is rejected at the 5% level of significance if $|Z| > 1.96.$ 'Acceptable' values of $\theta_0$ are precisely the values in the 95% Wilson interval.
Here the only assumption is that $Z$ is approximately standard normal, which is close to true for large enough $n$ and $\theta_0$ fairly near 1/2. (Various texts give various rules of thumb for this: one is that $n\theta_0$ and $n(1-\theta_0)$ both exceed 5.) An inconvenience of an exact Binomial test is that it is not generally possible to put probability 2.5% in each 'tail' of the discrete binomial distribution.
Usage in practice
Binomial CIs and tests are among the most widely used in practice-- throughout the social, biological, and physical sciences. It is easy to overestimate the enthusiasm for the use of computer methods across this broad spectrum of users, particularly when computer applications use theory, distributions, and methods that are not widely taught outside the mathematical sciences.
Indeed, just the realization that the Wald method can give truly horrible results for sample sizes below several hundred has dawned at an astonishingly slow pace--even inside the university environment. But you asked what is in use, not what ought to be in use.