Confidence interval for Binomial Distribution

429 Views Asked by At

I have recently heard that when constructing confidence intervals for a binomial distribution, with small probability of a success, and a large sample size it is best to use a Poisson distribution. I am trying to verify a theoretical result with a simulation. My theo probability is 0.00008539, and after 26,000,000 simulations it reported 2,233 successes (giving sample success probability 0.00008588). The theoretical number of successes is 2220.2. I understand the 95% confidence interval is $\lambda \pm 1.96\sqrt{\lambda/n}$, but in determining if my sim is working correctly, I am not sure when/where to use the sample $\lambda$ ($\hat{\lambda}$?) vs the theoretical $\lambda$. Any help appreciated!

2

There are 2 best solutions below

0
On

The Poisson approximation to the Binomial is that for $p$ small and $n$ large,

$$\mathrm{Bin}(n, p) \approx \mathrm{Pois}(np);$$

in other words, we're trying to find $\lambda = np$, and your estimate $\hat \lambda = 2233$ is the number of observed successes.

The $n$ in your confidence interval is a different value (not to be confused with the first $n$); this is just the number of times you observed a Poisson, which in your case is 1. So you would get the confidence interval $\hat \lambda \pm 1.96 \sqrt{\hat\lambda}$ for $\lambda$, and dividing that by $n=26\cdot10^6$ gives your confidence interval for $p$.

By my numbers, I think you should end up with the confidence interval $0.00008588 \pm 0.00000356$, which easily contains your theoretical estimate.

5
On

If you are already using a normal (Wald) approximation for the confidence interval (as evidenced by the use of $1.96$) then there is no benefit to using the Poisson approximation. You would just have $$n = 26 \times 10^6, \quad X = 2233, \quad \hat p = \frac{X}{n}, \tag{1}$$ and your $100(1-\alpha)\%$ Wald confidence interval is

$$\hat p \pm z_{\alpha/2}^* \sqrt{\frac{\hat p(1-\hat p)}{n}}. \tag{2}$$ This gives a margin of error of $3.56205 \times 10^{-6}$ and the resulting $95\%$ Wald confidence interval is $$(8.23226, 8.94467) \times 10^{-5}. \tag{3}$$

Notice that the Poisson model would have $\hat \lambda = n \hat p = X$, thus the interval $(2)$ becomes $$\frac{1}{n} \left( n \hat p \pm z_{\alpha/2}^* \sqrt{n\hat p(1-\hat p)} \right) = \frac{1}{n} \left( \hat \lambda \pm z_{\alpha/2}^* \sqrt{\hat \lambda (1 - \hat p)} \right). \tag{4}$$ When $\hat p \approx 0$, this becomes the Poisson interval with $1$ observation, scaled by the sample size.

By comparison, the $95\%$ Wilson score interval is $$(8.23957, 8.95213) \times 10^{-5} \tag{5}$$ and the exact $95\%$ (Clopper-Pearson) interval is $$(8.23591, 8.95222) \times 10^{-5}. \tag{6}$$ The interval $(6)$ is exact in the sense that it is guaranteed to have at least the nominal coverage probability.