Continuity Correction vs Confidence Interval

1.1k Views Asked by At

I'm curious about approximation using normal approximation, for example in this case : Rolling dice Probability that Sum.

What is difference between continuity correction and confidence interval here? Or they are not related at all?

Let say in the problem above, we need the approximation must be within $\pm 0.03 \%$. What need to be changed from the accepted answer?

1

There are 1 best solutions below

2
On BEST ANSWER

Continuity correction and confidence interval are unrelated. In the case you linked, there is no confidence interval. A continuity correction is used when approximating a discrete distribution using a continuous one (the classical example is the normal approximation to the binomial).

For a simpler example, imagine I'm flipping a coin 20 times and counting the number of heads. Obviously, this is a binomial random variable with $n = 20$ and $p = 0.5$. Say we want to find the probability that we get 10 to 15 heads. We could calculate the probability of getting 10, 11, 12, 13, 14, and 15 heads using binomial probability and then sum, but that is a lot of work. I'd rather use a normal approximation to do it in one fell swoop.

We know that the mean of a $\text{binomial}(20,0.5)$ variable is $\mu =20*0.5 = 10$ and the variance is $\sigma^2 = 20*0.5*(1-0.5) = 5$. If we ignore the continuity correction, we can plug this into a normal cdf (on a TI 84) as $\texttt{normalcdf(10,15,10,}\sqrt{\texttt{5}}) \approx 0.487$. But if we visualize a binomial distribution as a histogram (see here) we can see that each box is centered horizantally at the integer and extends by 0.5 in either direction. So in essence we miss (roughly) half of the probability on the $10$ box and half of the probability on the $15$ box. This what the continuity correction corrects. By going from $9.5$ to $15.5$ we can include the entirety of both boxes. Calculating it, $\texttt{normalcdf(9.5,15.5,10,}\sqrt{\texttt{5}}) \approx 0.582$. For completeness' sake, let's calculate the true binomial probability: $\texttt{binomcdf(9.5,15.5,10,}\sqrt{\texttt{5}})\approx 0.582$. So we can see that the continuity correction is much more accurate.

It is important to note that the continuity correction does not always extend in opposite directions. For instance, if I phrased the problem as $\text{P}(10\leq x \leq 15)$, the problem would not change. However, if I instead phrased it as $\text{P}(10 <x < 15)$ (note the strict inequality), we would go from $10.5$ to $14.5$ since $10$ and $15$ are no longer included. It is also worth noting that the continuity has less of an impact as $n$ increases.

So you may ask "Why bother with the normal approximation and continuity correction when I can just calculate the true probability?" The problem with doing so is that calculating a cumulative binomial probability is computationally expensive and gets impractical with large numbers (imagine flipping the coin a million times and finding the probability of getting 400000 to 500000 heads), whereas the normal approximation is very quick regardless of size.