I was reading through Casella and Berger's Statistical Inference when I came across the following problem:
The probability of a twin birth is approximately 1 /90, and we can assume that an elementary school will have approximately 60 children entering kindergarten (three classes of 20 each) . Explain how our "statistically impossible" event can be thought of as the probability of 5 or more successes from a binomial(60, 1 /90). Is this even rare enough to be newsworthy?
I'm struggling to understand how this is a binomial distribution. I looked up explanations for this online and got limited results. In my understanding, the probability of there being a pair of twins would be the probability of choosing two people to be twins, with the remaining $60-2 = 58$ people being twin-less.
Basically, if $X$ is the random variable indicating how many sets of twins are in the incoming class, then
$$P(X = x) = {60 \choose 2x} \Big(\frac{1}{90}\Big)^x \Big(\frac{89}{90}\Big)^{\frac{60-2x}{2}}$$
since we would choose $2x$ people to be twins, with each pair having $\frac{1}{90}$ people being twins, and then the rest of the remaining pairs ($\frac{60 - 2x}{2}$) would have probability $\frac{89}{90}$ of not being twins.
I found an explanation here that I don't understand:
I think the book is correct "in the first order"; you are double counting. Suppose there are n students in a classroom. We ask each student: "do you have a twin sibling?" If the the student answers "no," he or she leaves the classroom. If the student answers "yes," we ask the student to identify the sibling. Then both of them leave. If all students were to answer "no," the combinatorial coefficient would be 0Cn. If only one student answered "yes," then the combinatorial coefficient isn't 2Cn, it is 1C(n-1). If exactly k students answered yes, then the combinatorial coefficient isn't (2k)Cn, it is kC(n - k).
I know something is wrong with my approach, but I don't understand how I would be "double counting" the number of twins.
Thank you for any help.
Personally, I think that it is reasonable to assume that if a child has a twin, then that twin is in the same class. The following analysis is based on that presumption, which may conflict with the intended analysis. I am also assuming that no one in the class is part of a group of triplets (or quadruplets, et al).
If $f(k)$ is the probability of there being exactly $k$ pairs of twins in the class, and of there also being exactly $(60 - 2k)$ children in the class that were not born as part of a twin pair, then the final computation will be
$$1 - \left[\sum_{k = 0}^4 f(k)\right].$$
Let $p = (1/90), q = 1-p.$
Then,
$f(0) = q^{60},$
$f(1) = \binom{59}{1}p^1q^{58},$
$f(2) = \binom{58}{2}p^2q^{56},$
$f(3) = \binom{57}{3}p^3q^{54},$
$f(4) = \binom{56}{4}p^4q^{56}.$
The idea behind the computation of $f(2)$ (for example) is that there must have been exactly $58$ pertinent births, of which $2$ of the $58$ resulted in twins being born.
Edit
As discussed in another response, a critical question is:
if, in the class of children, $k$ of the pertinent births resulted in twins, how many pertinent births did not result in twins?
Is it $(60 - k)$, $(60 - 2k)$, or some number between the two?