How to set boundaries when approximating a discrete RV with a normally distributed RV?

75 Views Asked by At

I am approximating a random variable $S_n \sim \text{Bin}(n,0.02)$ with $Z_n=\frac{S_n-\mu n}{\sqrt{np(1-p)}} \sim \mathcal{N}(0,1)$ for high enough $n$ (by central limit theorem).

My problem is how to set interval boundaries when calculating probabilities with that approach.

For example when calculating the chance of $2$ or more (edited the value, but not important) successes in $100$ trials, what is the interval I would like $S_n$ to be in? Is it $[2,100]$ (meaning $S_n\geq2$) or $(1, 100]$ (meaning $S_n>1$), or something inbetween like $[1.5, 100]$?

Calculation for $S_n \in (1,100]$: $$P(S_n \in (1,100])=1-P(S_n \notin (1,100])=1-P\Big[\frac{S_{100}-0.02\cdot100}{\sqrt{100\cdot0.02\cdot0.98}}\leq \frac{1-0.02\cdot100}{\sqrt{100\cdot0.02\cdot0.98}}\Big]=1-\Phi(-0.71)=\Phi(0.71)\approx0,76$$

Calculation for $S_{100} \in [2,100]$: $$P(S_{100} \in [2,100])=1-P(S_{100} \notin [2,100])=1-P\Big[\frac{S_{100}-0.02\cdot100}{\sqrt{100\cdot0.02\cdot0.98}}\leq \frac{2-0.02\cdot100}{\sqrt{100\cdot0.02\cdot0.98}}\Big]=1-\Phi(0)=0.5$$

Both results are not exactly good approximations of the exact result $0.6$ calculated with the binominal distribution. I strongly suspect it is due to bad boundary setting.

1

There are 1 best solutions below

1
On BEST ANSWER

It seems the task is to find $P(X > 2) = 1-P(X \le 2)$ when $X \sim \mathsf{Binom}(n = 100, p=0.02).$ So the computation reduces to finding $P(X \le 2) = 0.6767,$ to four places. Exact computation in R is as below: first using the R binomial CDF function pbinom; second, using R as a calculator to compute the necessary three terms of the binomial PDF formula.

pbinom(2, 100, .02)
[1] 0.6766856

sum(dbinom(0:2, 100, .02))
[1] 0.6766856

.98^100 + 100*.98^99*.02 + choose(100,2)*.98^98*.02^2
[1] 0.6766856

For $n$ as large as $100$ and $p$ as small as $0.02$ one can get a reasonable approximation to $\mathsf{Binom}(100, 0.02)$ by using $\mathsf{Pois}(\lambda = 2).$ The approximation gives $0.6767,$ to four places.

ppois(2,2)
[1] 0.6766764
exp(-2)*(2^0 + 2 +2^2/2)
[1] 0.6766764

I do not believe a normal approximation is appropriate for this problem. The n0rmal distribution with matching mean and standard deviation is $Y \sim\mathsf{Norm}(\mu = 2, \sigma = 1.4).$ With a continuity correction we would have $P(Y \le 2.5) = 0.6395,$ which does not give even two place accuracy. Standardizing and using printed normal CDF tables would introduce some rounding error, even with some interpolation.

sqrt(100*.02*.98)
[1] 1.4
pnorm(2.5, 2, 1.4)
[1] 0.6395076

(2.5-2)/1.4
[1] 0.3571429
pnorm(.357)    # in table, interpolate btw .35 & .36 ...
[1] 0.6394541  # ... to get about .639 or .640, NOT .6767

The following figure shows the exact binomial probabilities (bars), approximate Poisson probabilities (centers of open circles), and the approximating normal density curve. At the scale of this graph it is difficult to see the slight differences between exact binomial and approximate Poisson probabilities. The normal curve is a very poor fit.

enter image description here

k = 0:10;  PDF = dbinom(k, 100, .02)
PDF.pois = dpois(k, 2)
hdr = "BINOM(100, .02) with Poisson & Normal Aprx."
plot(k, PDF, type="h", xlim=c(-1.5,10), lwd=2,
   col="blue", main=hdr)
 points(k, PDF.pois, col="red")
 curve(dnorm(x, 2, 1.4), add=T, lwd=2)
 abline(h=0, col="green2")