Why do we use a Poisson distribution here rather than Binomial?

46 Views Asked by At

Approximately 80,000 marriages took place in the state of New York last year. Estimate the probability that for at least one of these couples, (a) both partners were born on April 30

I understand how to find the answer (~.45) but on an exam, what language gives away that this question would be asking for a Poisson distribution rather than a Binomial one?

1

There are 1 best solutions below

0
On

Let's do the calculation both ways and see what happens:

Under a binomial model, we have $n = 80000$, and under the assumption that any individual has an equal probability of being born on any one of $365$ days in a year, and that partners in marriage are not any more or less likely to marry someone sharing the same birthday than any two randomly selected people, then $p = \frac{1}{365^2} = \frac{1}{133225} \approx 7.5061 \times 10^{-6}$.

Then $$X \sim \operatorname{Binomial}(n,p)$$ models the random number of married couples who were both born on April 30, and the question is asking for $$\Pr[X \ge 1] = 1 - \Pr[X = 0] = 1 - \binom{n}{0} p^0 (1-p)^{n-0} = 1 - (1 - p)^n \approx 0.45145729806317.$$

Under a Poisson model, the event rate of couples sharing an April 30 birthday is $$\lambda = np = \frac{80000}{133225} = \frac{3200}{5329} \approx 0.600488.$$ This rate has the units of couples per year, under the assumption that exactly $80000$ couples marry in New York each year; equivalently, we can characterize this rate in units of couples per cohort of $80000$ couples. Then $$Y \sim \operatorname{Poisson}(\lambda)$$ models the random number of couples we actually observe in this cohort, and we want to calculate $$\Pr[Y \ge 1] = 1 - \Pr[Y = 0] = 1 - e^{-\lambda} \frac{\lambda^0}{0!} \approx 1 - e^{-0.600488} \approx 0.45145\color{red}{6061826455},$$ where the red digits differ from the result obtained using the (exact) binomial model.

As we can see, the Poisson approximation is quite acceptable. This is because $n$ is large and $p$ is small, and this is the context that implies the use of such a model to answer the question.

But why not just use the binomial model? Well, handheld scientific calculators might struggle to compute the quantity $(1-p)^n$, but any scientific calculator would be able to calculate $e^{-\lambda}$. To be clear, it is not wrong per se to use a binomial model; it is just potentially computationally inefficient. Indeed, if we were to ask instead, assuming that there are approximately $1.35$ million marriages per year in the United States, what is the probability that there are more than $10$ couples in the US who all share an April 30 birthday in a given year, then the binomial calculation definitely becomes problematic for a handheld calculator: $$\Pr[X \ge 11] = 1 - \sum_{x=0}^{10} \binom{1350000}{x} \frac{1}{133225^x} \left(1 - \frac{1}{133225}\right)^{1350000-x},$$ and this requires a computer algebra system that supports arbitrary precision arithmetic. The result is approximately $0.4336242316937395$. However, a Poisson approximation is still tractable even with a scientific calculator: the event rate is now $\lambda = \frac{1350000}{133225} \approx 10.1332$, hence $$\Pr[Y \ge 11] = 1 - \sum_{y=0}^{10} e^{-\lambda} \frac{\lambda^y}{y!} \approx 0.433624\color{red}{169114307}.$$