How well is a binomial distribution with $N=100,000$ approximated by a normal distribution?

113 Views Asked by At

When we consider a binomial distribution with large $N$ and $p=0.5$, this is approximately equal to a normal distribution with mean $\frac12N$ and standard deviation $\frac12\sqrt{N}$. However, according to large deviations theory, this does only hold when you are `close to the mean'. So the chance that you get value $0.8N$ may be different than the normal distribution would give you, for example.

I would like to know up to how many standard deviations the normal distribution is a good approximation for $N=100,000$. What is the largest integer $l$ such that for all $x$ which are at most $l$ standard deviations away from $\frac12$, the relative difference between the normal distribution pdf value and the binomial distribution pdf value is at most $10\%$?

1

There are 1 best solutions below

2
On

With $\sigma^2 = N/4$, and for $x$ fixed, Stirling's formula gives an asymptotic expansion $$ \frac{ \displaystyle{ \frac{1}{2^N} \binom{N}{N/2 + x \sigma} } } { \displaystyle{ \frac{e^{-x^2/2}}{\sigma \sqrt{2 \pi}} } } = 1 - \frac{(x^4 - 6 x^2 + 3)}{12N} + \ldots $$ As $N \rightarrow \infty$ this recovers the central limit theorem. Although this is for fixed $x$, this suggests that there is non-trivial error in the regime where $x^4 \sim N$. Thus write $x = N^{1/4} y$ and repeat the analysis with Stirling's formula to obtain $$ \lim_{N \rightarrow \infty} \frac{ \displaystyle{ \frac{1}{2^N} \binom{N}{N/2 + y N^{1/4} \sigma} } } { \displaystyle{ \frac{e^{-x^2/2}}{\sigma \sqrt{2 \pi}} } } = e^{-y^4/12}.$$ In particular, for an error of 10 percent, one wants to take $$y = \sqrt[4]{12 \log(1 + 1/9)},$$ or $$k = \sigma x = N^{1/4} \sigma y = \frac{N^{3/4}}{2} \cdot \sqrt[4]{12 \log(1 + 1/9)} = N^{3/4} \cdot 0.530193 \ldots.$$ This is the answer as $N \rightarrow \infty$. If you specialize to $N = 100000$, you get $$k = 2981.499 \ldots $$ or $$x = 18.85666\ldots $$ standard deviations. In this case, if you take $k = 2981$ and $$x^2 = \frac{k^2}{\sigma^2} = \frac{8886361}{25000},$$ then $$ \frac{ \displaystyle{ \frac{1}{2^{100000}} \binom{100000}{52981}} } { \displaystyle{ \frac{e^{-8886361/50000}}{100 \sqrt{5 \pi}} } } = 0.90153\ldots $$ which shows that already for $N$ in this range the answer above is quite accurate, the first actual value strictly below 10% occurring at $52994$ which is only $13$ away. So if you are within 18 standard deviations then the approximation is valid to within 10% but not at 19 standard deviations, and in general the number of standard deviations is (as seen above) asymptotic to $N^{1/4} \cdot 1.06038\ldots$. More generally, if you are $o(N^{1/4})$ standard deviations away the relative error will be negligible, if you are $\Omega(N^{1/4})$ standard deviations the relative error will approach 100%, and if you are $N^{1/4} y$ standard deviations away the relative error is $\sim 1 - e^{-y^4/12}$.