Central limit theorem and application to binomial distribution

862 Views Asked by At

It is said that if the product of $n$, the number of trials, and $p$, the probability of a success is large, then a binomial distribution can be accurately approximated by a normal distribution.

Is the theory supporting this the Central Limit Theorem? When I think of central limit theorems, I usually think of the sum or mean of a series of IID random variables, where the sum or mean approaches a normal distribution as the number of variables approaches infinity. However, in the current case, I don't see sums or means, so is the idea that the binomial distribution can be approximately as normal because of CLT or some other condition?

1

There are 1 best solutions below

0
On

Elaborating on the document cited in the OP's comment, the claim is, that the hypotheses $0\le p_n,q_n\le 1$, $np_n\to\infty$, and $ nq_n\to\infty$ (where $q_n=1-p_n$) together imply that the CLT applies to $X_n\sim \operatorname{Bin}(n,p_n)$, in the sense that $$\lim_{n\to\infty}P\left(\frac{X_n-np_n}{\sqrt{np_nq_n}}<x\right)= \lim_{n\to\infty}F_n(x) = \Phi(x)$$ for all $x$. (Where $F_n$ is the cdf of the standardized version of $X_n$, and $\Phi$ is the standard normal cdf.) In the same vein, one could ask if the same conclusion followed under the simpler-looking hypothesis that $np_nq_n\to\infty$.

This version differs from the original statement of the problem by imposing a condition on $n q_n$, as does the web page cited in the OP's comment.

By the Berry-Essen theorem (a sharpening of the usual central limit theorem) we know that there is a constant $C$ such that for all $n$, and real $x$, $$ |F_n(x)-\Phi(x)|\le \frac {C\rho_n}{\sigma_n^3\sqrt n}=B_n\text{ say},$$ where $C$ is a constant, $\sigma_n=\sqrt{p_nq_n}$, and $\rho_n=p_nq_n(p_n^2+q_n^2)$. Note that if $Z_n=-p_n$ with probabilty $q_n$ and $Z_n=1-p_n$ with probability $p_n$, then $E[Z_n]=0$, $\sigma_n^2=E[Z_n^2]$, and $\rho_n=E[|Z_n|^3]$.

So now one just notices that $$B_n=\frac{C(p_n^2+q_n^2)}{\sqrt{np_nq_n}}=\frac{Cp_n^{3/2}}{\sqrt{n q_n}} + \frac{Cq_n^{3/2}}{\sqrt{n p_n}} = O\left(\frac 1{\sqrt{np_n}} + \frac 1{\sqrt {nq_n}}\right) = o(1),$$ under the first set of hypoptheses. If one assumes $np_nq_n\to\infty$ the result follows similarly: $B_n\le C/\sqrt{np_nq_n}=o(1)$. In either case, if you want $B_n$ to be less than (say) $1/10$, this tells you what your thresholds on $np_n$ and $nq_n$ must be, and so on.