Approximation of binomial distribution - Poisson vs Normal distribution

1.6k Views Asked by At

Suppose we have a random variable $X \sim \text{Bin}(n, p)$ and we want to approximate $\mathbb{P}\left[X = k\right]$.

In my statistics course we learned two ways to do this:

  • By using the fact that $X \approx \text{Poi}(\lambda = n\cdot p)$
  • By using the DeMoivre-Laplace theorem, thus approximating $X$ by the use of the normal distribution

I understand how to apply both of them, but in the context of an exam I am unsure which one to choose for a given problem, if there is no requirement stated.

What are the requirements, disadvantages, etc.. for the two choices?

1

There are 1 best solutions below

4
On BEST ANSWER

Both describes two different limiting distribution of binomial distribution.

  1. Poisson approximation. If $n$ is large but $np$ is not large, then $\operatorname{Bin}(n, p) \approx \operatorname{Poisson}(np)$. More precisely, if $X_n \sim \operatorname{Bin}(n, p_n)$ and $n p_n \to \lambda \in [0, \infty)$ as $n\to\infty$, then

    $$ \lim_{n\to\infty} \mathbf{P}(X_n = k) = \frac{\lambda^k}{k!}e^{-\lambda} \qquad \text{for all} \quad k = 0, 1, 2, \cdots.$$

  2. Normal approximation. If $n$ is large and $p \in (0, 1)$ is fixed, then $\operatorname{Bin}(n, p) \approx \mathcal{N}(np, np(1-p))$. More precisely, if $X_n \sim \operatorname{Bin}(n, p)$, then

    $$ \lim_{n\to\infty} \mathbf{P}\left( \frac{X_n - np}{\sqrt{np(1-p)}} \leq z \right) = \int_{-\infty}^{z} \frac{1}{\sqrt{2\pi}}e^{-x^2/2} \, \mathrm{d}x \qquad \text{for all} \quad z \in \mathbb{R}.$$

As we can check from the precise statements, these approximations primarily cover the limiting cases. So they do not necessarily draws a solid line between two approximations. Perhaps it is better to see directly how both methods perform well for different values of parameters. The followings are plots for these 3 distributions for $n = 1000$ and $p$ varied.

Comparison

As we see, if $p$ is small, then Poisson approximation is much better (so that even plot markers for those two distributions are rarely distinguishable), while Normal approximation is better for large $p$. Also there are some grey area where both approximates the binomial distribution moderately well.