How to set up normal approximation for binomial

165 Views Asked by At

In a particular school, 25% of first grade students do not enjoy reading. 22% of second graders do not enjoy reading. A random sample is taken of 100 first grade students, and another independent sample of 100 second graders is taken.

Part 1: Use normal approximation to find the probability that less than 30 first grade students in the sample do not enjoy reading.

Part 2: Use normal approximation to find the probability that 5 more second graders than first graders in the samples do not enjoy reading.

For part 1, after doing the 0.5 correction, I got $$ O \ follows \ N(25, 18.75) $$ $$ P(O \le 29.5) = 0.851 $$

Is that right? And for part 2, I'm not exactly sure how to set it up.

Thanks!

2

There are 2 best solutions below

1
On BEST ANSWER

Part 2: For second grade students p = .22, n = 100, and Y is the number of sampled students who do not enjoy reading. Much as in Part 1, we obtain $E(Y) = 22$ and $V(Y) = 17.16$.

We also need the mean and variance of $W = Y - X$, with $X$ as in Part 1. We have $E(W) = E(Y-X) = 22 - 25 = -3$ and $V(W) = 18.75 + 17.16 = 36.91$ (notice that variances ADD), so that the SD of the difference is 5.9924. We seek

$$P\{W = 5\} = P\{4.5 \le W \le 5.5\} \approx P\{(4.5 + 3)/5.9924 \le Z \le (5.5+3)/5.9924\} = P\{1.252 \le Z \le 1.418\} = 0.027.$$

In Part 1, some authors may have ignored the "continuity correction" that uses half integers in the bounds. Here, however this correction is essential to getting a sensible answer.

As in Part 1, you may get a slightly different answer using printed tables of the standard normal distribution, rounding bounds to two places, than we did with software. In R, the statement 'diff(pnorm(c(1.252, 1.418))' returns 0.02719.

A simulation of a million runs of this 200-student experiment yielded the same answer 0.027 to three places as in the approximation. A histogram of the million differences of reading averse students is closely matched by a normal curve with mean -3 and standard deviation 5.9924. (It would have been possible to get an exact answer, but we did not. You might want to ponder how many binomial probabilities would have to be computed to do that.)

0
On

Part 1: p = .25 of first grade students do not enjoy reading; n = 100 first grade students selected at random; X = number out of 100 who do not enjoy reading. We seek $P\{X < 30\} = P\{X \leq 29\}.$ Using R software, the exact probability is given by 'pbinom(29, 100, .25)' which returns 0.850459.

How can this probability be approximated using the normal distribution? (First n is large enough and p is far enough from 0 (and 1) that such an approximation is feasible.) $E(X) = np = 100(.25) = 25$ and $V(X) = np(1-p) = 18.75.$ Then $$ P\{X \le 29.5\} = P\{(X - 25)/\sqrt{18.75} < (29.5 = 25)/\sqrt{18.75} = 1.039\} \approx P\{Z \le 1.039\} = 0.8506.$$ The 'standardized' binomial random variable $(X - 25)/\sqrt{18.75}$ is approximated by the standard normal random variable $Z$. (Notice the 'approximately equal' sign $\approx$ at the appropriate point.)

Notice that for the discrete random variable $X$ the desired probability can be express either as $P\{X < 30\}$ or $P\{X \leq 29\}$. When approximating by the continuous normal random variable, one generally obtains a better approximation by using a bound 29.5 halfway between.

The approximate normal probability can be found either with software or by using printed tables of the standard normal distribution. (In the latter case you would probably have to use 1.04 rather than 1.039, we used software so your answer may be slightly different.)

Notice that the result 0.8506 is not far from the exact result 0.850495. As here, normal approximations commonly give about two-place accuracy.