Why is normal distribution more accurate than binomial distribution?

5.8k Views Asked by At

I'm having a tough time understanding this.

This is what I am told about comparing the two:

The probability that Saredo is late for school is 0.6.

What is the probability that in one month she is late 9 times?

Remember that one month would include 20 school days.

Using the binomial distribution

n = 20 and p = 0.6

So p(9) = 20C9 0.69 0.411 = 0.0710 = 7.10%

Using the normal distribution

n = 20 and p = 0.6

μ = np = 20 × 0.6 = 12

σ = √np(p − 1) = √20 × 0.6 × 0.4 = 2.19

The boundaries we will use to find the probability of 9 will be 8.5 and 9.5.

z(9.5) = 9.5 − 12 = -1.14 2.19 z(8.5) = 8.5 − 12 = -1.60 2.19 Using the table, -1.14 gives us a probability of 0.1271

-1.60 gives us a probability of 0.0548

0.1271 − 0.0548 = 0.0723 = 7.23%

The Comparison

The binomial distribution gave us a probability of 7.1% (to one decimal place).

The normal distribution gave us a probability of 7.2%

This shows us that the normal distribution can give us accurate approximations. However, in this case, it clearly involved more work.

I do not understand why it is more accurate, can anyone shed some light on this?

3

There are 3 best solutions below

1
On BEST ANSWER

The reason is because the normal distribution is actually a pretty good approximation for the binomial distribution.

If you look at the shape of a binomial distribution, as your $n$ gets large, it begins to look more and more like a normal distribution. Although it seems like more work, when $n$ is very large, it's actually much easier to use the normal approximation.

The mathematical reasons behind this coincidence are described by the de Moivre-Laplace Theorem, and it is basically a special case of the central limit theorem -- which turns out to be immensely important in statistics!

Now, because the normal distribution is a continuous distribution, you will probably compute an answer to arbitrarily many decimal places. But this does not mean the result is more accurate. In fact, the answer is always less accurate, because the binomial distribution gives us the exact result.

The issues of computing the exact result numerically in the binomial distribution, due to the large factorials, can induce a source of numerical error if you are not careful, but if you imagine yourself with a magical computer with infinite precision and memory, the binomial distribution will always return the exact error for problems of this type.

0
On

I get essentially the same thing for the normal approximation, roughly $7.19\%$ versus the binomials about $7.08\%$.

The binomial distribution (under our assumptions) gives an exact answer. It is a little surprising how well the normal approximation (with continuity correction) did in this case. A sample size of $20$ is smallish for the normal to give this good an approximation to the binomial. In my experience this performance by the normal approximation was better than usual for this sample size. The normal approximation was a little lucky.

For this particular problem, given a calculator that finds $\binom{20}{9}$ with no trouble, we do not need to use an approximation. But for say $n=60$, and the probability that the number of absences is between $28$ and $37$, the normal approximation would be less work, unless we had software to compute the appropriate probability under the binomial distribution.

0
On

With the binomial distribution you can calculate the exact answer to the question. To see this, you have to look at its formula:

$P(x>k)=\pmatrix{n\\k}p^k(1-p)^{n-k}$

The probability of being late is $p$ and of not being late is $1-p$. The probability of being late $k$ times and not being late the rest $(n-k)$ is then $p^k(1-p)^{n-k}$. How many different ways are there of picking the $k$ days in which she was late from $n$ days? Exactly $\pmatrix{n\\k}$.

This is why the binomial distribution gives the exact answer.

Now, it turns out the normal distribution can be used to get an approximate answer when there is a sufficient number of samples. The reasons are less intuitive in this case, but the idea is that since the binomial coefficient $\pmatrix{n\\k}$ is difficult to calculate for large numbers, you can substitute it with a different approximate term which is easy to calculate, known as Sterling's approximation, and then with some extra manipulation you find that what you get is a normal distribution.