Why is the birthday problem not so surprising?

183 Views Asked by At

I'm reading Blitzstein/Hwang's "Introduction to Probability". Here, they give an explanation as to why the conclusion obtained in the birthday problem is not so surprising:

enter image description here

I don't quite understand how this number of people makes it not so surprising. Could you expand a little bit?

2

There are 2 best solutions below

0
On BEST ANSWER

So first,the ${k \choose 2}$ pairs of people that exist each separately have probability $(364/365)$ to have the same birthday. This part is just a fact.

The part that is approximate is estimating the probability of matches being independent. This is obviously not exactly true, because if $\{ A,B \}$ is a match and $\{ B,C \}$ is a match then $\{ A,C \}$ is also a match. More dramatically, this estimate doesn't provide you with the fact that a match is exactly guaranteed with $k>365$. Still, it is fairly close to being correct when $k$ is small enough. One intuitive explanation is that when $k$ is small enough, there is an overwhelming probability that there are either $0$ or $1$ matches, and this scenario is well-handled by both models.

That said, if you doubt this approximate independence, the only thing that can definitely persuade you is to actually get into the weeds with it a bit. One way to proceed is to extract the $(364/365)^{k \choose 2}$ formula as an asymptotic estimate of $\frac{365!}{(365-k)! 365^k}$. This can be done as follows. First use Stirling's formula:

$$\frac{365!}{(365-k)! 365^k} \sim \frac{\sqrt{2\pi} (365)^{365+1/2} e^{-365}}{\sqrt{2\pi} (365-k)^{365-k+1/2} e^{-(365-k)} 365^k}.$$

This step works better when $k$ is small, but works decently even when $k$ is close to $365$ (the relative error is never more than 10% even when $k=364$).

Now you simplify:

$$\frac{365!}{(365-k)! 365^k} \sim \frac{e^{-k}}{(1-k/365)^{365-k+1/2}}.$$

The next step is to approximate that denominator. It is exactly given by

$$e^{(365-k+1/2) \ln(1-k/365)}.$$

Now for small $k$ we can Taylor expand the logarithm. A trick you can use to save some effort in this is

$$\ln(1-k/365)=-\ln \left ( \frac{1}{1-k/365} \right ) = -\ln \left ( 1 + \frac{k/365}{1-k/365} \right ).$$

Expanding that to second order we have

$$(365-k+1/2) \ln(1-k/365) \sim -(365-k+1/2) \left ( \frac{k/365}{1-k/365} - \frac{1}{2} \left ( \frac{k/365}{1-k/365} \right )^2 \right ).$$

So now the quantity we are interested in is

$$-k + (365-k+1/2) \left ( \frac{k/365}{1-k/365} - \frac{1}{2} \left ( \frac{k/365}{1-k/365} \right )^2 \right ) \\ = \frac{(365-k+1/2)(k/365) - k + k^2/365}{1-k/365} - \frac{1}{2} \frac{(365-k+1/2) k^2/365^2}{(1-k/365)^2} \\ = \frac{1}{2} \left ( \frac{k/365}{1-k/365} - \frac{k^2/365 - k^3/365^2 + (1/2) k^2/365^2}{(1-k/365)^2}\right ).$$

Expanding the denominators and then discarding all terms involving division by $365^2$ or higher powers yields an exponent of

$$\frac{1}{2} \frac{k-k^2}{365}$$

so our estimate for the match probability becomes

$$e^{-\frac{k \choose 2}{365}}.$$

Note that this last estimate is extremely bad if $k$ gets too close to $365$.

Compare this with

$$(364/365)^{k \choose 2}=e^{{k \choose 2} \ln(1-1/365)} \sim e^{-{k \choose 2}/365}$$

which is of course the same thing.

0
On

So we have 253 pairs of people, each of them have a probability 1/365 to be a match. Let N the number of pairs of people whose anniversary is the same day. As mentioned in other answers, the probability to get a match is not so straightforward to estimate because the pairs are no independent. However, the expected number of pairs is easy to compute: E(N) = 253 / 365 ~= 0.69

This expectation may be re-written as follow: E(N) = P(N>=1) + P(N>=2) + P(N=>3) + ...

And now here is the intuition: if you expect that 1 pair is unlikely, observing 2 or more pairs should be still much more unlikely... and the value of P(N>=2) + P(N=>3) + ... should be small compared to P(N=1). This tell us that P(N>=1) cannot be so small, ie at least a collision cannot be unlikely.