birthday problem - expected number of collisions

29.7k Views Asked by At

There are many descriptions of the "birthday problem" on this site — the problem of finding the probability that in a group of $n$ people there will be any (= at least 2) sharing a birthday.

I am wondering how to find instead the expected number of people sharing a birthday in a group of $n$ people. I remember that expectation means the weighted sum of the probabilities of each outcome:

$$E[X]=\sum_{i=0}^{n-1}x_ip_i$$

And here $x$ must mean the number of collisions involving $i+1$ people, which is $n\choose i$. All $n$ people born on different days means no collisions, $i=0$; two people born on the same day means $n$ collisions, $i=1$; all $n$ people born on the same day means $n$ collisions, $i=n-1$.

Since the probabilities of three or more people with the same birthday are vanishingly small compared to two people with the same birthday, and decreases faster than $x$ increases, is it correct to say that this expectation can be approximated by

$$E[X]\approx {n\choose 0}p_{no\ collisions}+{n\choose 1}p_{one\ collision}$$

This doesn't look right to me and I'd appreciate some guidance.


Sorry - edited to change ${n\choose 1}$ to ${n\choose 0}$ in second equation. Sloppy of me.

2

There are 2 best solutions below

9
On BEST ANSWER

The probability person $B$ shares person $A$'s birthday is $1/N$, where $N$ is the number of equally possible birthdays,

so the probability $B$ does not share person $A$'s birthday is $1-1/N$,

so the probability $n-1$ other people do not share $A$'s birthday is $(1-1/N)^{n-1}$,

so the expected number of people who do not have others sharing their birthday is $n(1-1/N)^{n-1}$,

so the expected number of people who share birthdays with somebody is $n\left(1-(1-1/N)^{n-1}\right)$.

3
On

I will try to get control of the most standard interpretation of our question by using (at first) very informal language. Let us call someone unhappy if one or more people share his/her "birthday." We want to find the "expected number" of unhappy people.

Define the random variable $X$ by saying that $X$ is the number of unhappy people. We want to find $\text{E}(X)$. Let $p_i$ be the probability that $X=i$. Then $$\text{E}(X)=\sum_{i=0}^{n} i\,p_i$$ That is roughly the approach that you took. That approach is correct, and a very reasonable thing to try. Indeed have been trained to use this approach, since that's exactly how you solved the exercises that followed the definition of expectation.

Unfortunately, in this problem, finding the $p_i$ is very difficult. One could, as you did, decide that for a good approximation, only the first few $p_i$ really matter. That is sometimes true, but depends quite a bit on the values $N$ of "days in the year" and the number $n$ of people.

Fortunately, in this problem, and many others like it, there is an alternative very effective approach. It involves a bit of theory, but the payoff is considerable.

Line the people up in a row. Define the random variables $U_1,U_2,U_3,\dots,U_n$ by saying that $U_k=1$ if the $k$-th person is unhappy, and $U_k=0$ if the $k$-th person is not unhappy. The crucial observation is that $$X=U_1+U_2+U_3+\cdots + U_n$$

One way to interpret this is that you, the observer, go down the line of people, making a tick mark on your tally sheet if the person is unhappy, and making no mark if the person is not unhappy. The number of tick marks is $X$, the number of unhappy people. It is also the sum of the $U_k$.

We next use the following very important theorem: The expectation of a sum is the sum of the expectations. This theorem holds "always." The random variables you are summing need not be independent. In our situation, the $U_k$ are not independent, but, for expectation of a sum, that does not matter. So we have $$\text{E}(X)=\text{E}(U_1) + \text{E}(U_2)+ \text{E}(U_3)+\cdots +\text{E}(U_n)$$

Finally, note that the probability that $U_k=1$ is, as carefully explained by @Henry, equal to $p$, where $$p=1-(1-1/N)^{n-1}$$ It follows that $\text{E}(U_k)=p$ for any $k$, and therefore $\text{E}(X)=np$.