The Birthday Paradox with combinatorics

656 Views Asked by At

Assume a year has $365$ days, how many are required to have a $50%$ chance of $2$ people having the same birthday?

According to Scientific American, there are $23$ people needed to achieve the goal.

$${\begin{pmatrix} 23 \\ 2\end{pmatrix}}=253$$ $$1-(1-\frac{1}{365})^{253}\approx 0.50048$$

However, I have a different approach but I'm not sure if this is correct.
One could be any day in a year. And $23$ people would be $365^{23}$ possibilities.
Suppose no one in $23$ people has the same birthday. That would be :- ${\begin{pmatrix} 365 \\ 23 \end{pmatrix}} \cdot23!$

The possibility of having at least 2 people having the same birthday is :-

$$1-\frac{{\begin{pmatrix} 365 \\ 23 \end{pmatrix}} \cdot23!}{365^{23}}\approx0.5072972$$

Although $0.50048\approx 0.5072972$, these two numbers are not equal to each other.

Can anyone explain this difference and the reason behind this phenomenon?

3

There are 3 best solutions below

0
On BEST ANSWER

The Scientific American's calculation is not quite accurate: it supposes that every non-match between two people is independent from any other non-match between two people, but that is not the case.

To see this, consider just 3 people: A, B, and C.

What is the chance that they don't share a birthday?

Scientific American's approach is to say: well, we need the following three events:

E1. A and B don't share a birthday

E2. A and C don't share a birthday

E3. B and C don't share a birthday

Now, individually, the probability of each of those events is $\frac{364}{365}$

So, they say: $P(E1 \cap E2 \cap E3) = P(E1) \cdot P(E2) \cdot P(E3) = (\frac{364}{365})^3$

But again, that is assuming these three events are independent ... and they are not! For example, once we know that $A$ does not share a birthday with either $B$ or $C$ (i.e. once we have events $E1$ and $E2$) the probability of $B$ not sharing a birthday with $C$ (i.e. event $E3$) is no longer a simple $\frac{364}{365}$, because it is no longer possible for $B$ or $C$ to have their birthday on the day that $A$ has their birthday.

Indeed, the more people we know do not share a birthday with $B$ and $C$, the fewer options become available for $B$ and $C$ to have their birthdays, and this will in fact increase the chances of $B$ and $C$ to share a birthday, and thus decrease the chance of them not sharing a birthday.

To illustrate how the probability of $B$ and $C$ not sharing their birthday is affected once we know they don't share the same birthday with $A$ (how, indeed, that probability has gone down), consider what happens when we consider only three possible days that each of $A$, $B$, and $C$ can have their birthday.

First of all, the probability of $B$ and $C$ sharing their birthday not knowing anything about $A$ is $\frac{1}{3}$, and hence there is a $\frac{2}{3}$ probability that they don't share their birthday.

OK, but now let's add the information that $B$ nor $C$ share their birthday with $A$. That means that there are now only two days left for $B$ and $C$ to have their birthday, meaning that now there is a $\frac{1}{2}$ probability that they share their birthday, and hence a $\frac{1}{2}$ probability that they don't share their birthday. So, indeed, the probability that $B$ and $C$ don't share their birthday has gone down once we know they don't share their birthday with $A$.

And indeed, note what happens when we have even fewer possible birthdays: if we get to a point where we have more people than possible birthdays, then the probability of two people not matching their birthdays given that no one else shares their birthday with any of these two people simplybecomes $0$.

In sum: the probability that Scientific American assumes for each non-match is too high.

Now, with the actual birthday problem we are not considering $3$ people and $3$ possible birthdays, but $23$ people with $365$ possible birthdays. And with so many more birthdays than people, it turns out that Scientific American's calculations are not going to be that far off. But still: the $\frac{364}{365}$ is a little too high for the remaining non-matches once you already know other non-matches, meaning that the $(\frac{364}{365})^{253}$ is a little too big, and thus when subtracted from $1$, the Scientific American's calculation ends up being a little too low.

But of course, it would be hard to tell this beforehand (note that the $\frac{364}{365}$ does get raised to the $253$-th power!) ... It happened to get 'close enough' for this problem ... but I really don't like to think that it was an a priori justified simplification. You certainly can't use this kind of approximation to establish that $23$ people was the magic 'breaking point' as opposed to $22$ people, or maybe even fewer. Indeed, they we're lucky that with their calculation for $23$ people it just stayed above $0.5$.

Your calculation, however, is completely correct. Good job!!

0
On

The SciAm article seems to be wrong. The reason is, I believe, this: let's say with three (rather than $23$) birthdays $b_1,b_2,b_3$ they treat $b_1\ne b_2$, $b_2\ne b_3$ and $b_1\ne b_3$ as independent events, which they are not:

$$P(b_1\ne b_3\mid \lnot(b_1\ne b_2), b_2\ne b_3)=1\ne P(b_1\ne b_3)$$

for example. (Knowing that $b_1=b_2$ and $b_2\ne b_3$ makes it a certainty that $b_1\ne b_3$.)

0
On

It's easy to see that SA's argument is incorrect. If we have 366 people, then according to SA's argument, we should take $1- \left(1 - \frac 1 {365}\right) ^{\begin{pmatrix} 366 \\ 2\end{pmatrix}} $. We don't need to calculate the exact value to know this won't be exactly equal to $1$. But the probability of having 366 different people all with different birthdays is clearly zero, because there aren't 366 different birthdays to have (this is all ignoring leap days).

The problem with SA's formula is that it calculates the number of pair, and calculates the probability of a particular pair sharing a birthday (364/365), and treats all of those events as being independent. But the probability of Alice and Bob having different birthdays and Bob and Charles having different birthdays isn't (364/365)^2. If Alice and Bob don't have the same birthday, then there are 363 birthdays left for Charles to have that's different from them, not 364.