What if we change the distribution of birthdays in the birthday paradox?

610 Views Asked by At

In a room of $n$ people, it is well known that the probability that nobody shares the same birthday is: $$\frac{365!}{(365-n)!365^n}. $$

Of course we assume here that all $Bd_i$ (random variable for the birthday of the $i$th person) are independent and they have the same distribution $X$, which is uniform on $\{1,...,365\}$.

We know that for $n=23$ the probability $p_n$ that nobody shares the same birthday is below $0.5$.

It doesn't seem relevant to change the independence hypothesis, however the uniform distribution may be changed (one could consider that people are more likely to be born in fall considering the fact that winter's evening...). More seriously, what is the effect of the law $X$ on the probability number $n(X)$ which is the first number for which $p_n$ is below $0.5$ ?

For any probability $X$ we have $n(X)\leq 23=n(\text{uniform law})$.

However, I cannot figure out a way to do this properly. I assume that we have an inequality involved but my probability is not top-level. Any help will be appreciated.

Edit : For instance, some extreme cases.

Let $X$ be Dirac. This means that $X=1$ (i.e. all people are born on the first of January) then $n(X)=2$ (if there are two people in the room they will have the same birthday).

This is the most extreme case. The probability that nobody shares the same birthday is $1$ if $n=1$, $0$ if $n>1$.

Let $X$ be uniform on $\{1,...,31\}$, i.e. every people must be born in January.

Then the probability that nobody shares the same birthday is :

$$\frac{30!}{(30-n)!30^n} $$

We then see that $n(X)=7$.

1

There are 1 best solutions below

0
On BEST ANSWER

This is only a partial answer which addresses the uniformity assumption. As Arthur noted, having non-uniform distribution favors birthday coincidences (collision). In fact, there is a stronger result.

For a probability distribution $\mathbf{p} = (p_1,p_2,\dots,p_N)$, let $T^{\mathbf p}$ be the the time of first collision for persons whose birthdays are distributed according to $\mathbf p$. That is, you take independent random variables $X_1, X_2, \dots$ with $P(X_i = j) = p_j, j=1,\dots,N$ and set $$ T^{\mathbf p} = \min\{k\ge 2: \exists\, i<k \text{ such that } X_i = X_j\}. $$ Naturally, the distribution does not depend on the order of the probabilities, so we are free to assume that $p_1\ge p_2\ge\dots\ge p_N$.

Now the announced stronger claim:

For each $n\ge 1$, the probabilities $P(T^{\mathbf p}> n)$ are Schur-concave in $\mathbf p$. This means that if $\mathbf{p}$ majorizes $\mathbf{q}$ in the sense that $p_1\ge q_1$, $p_1+p_2\ge q_1+q_2$, $\dots$, $p_1+\dots + p_{N-1} \ge q_1+\dots + q_{N-1}$, $p_1+\dots + p_{N} = q_1+\dots + q_{N-1} = 1$, then for any $n\ge 1$ $$ P(T^{\mathbf p}> n)\le P(T^{\mathbf q}> n).$$

The reason is that the probability $$ P(T^{\mathbf p}> n) = n!\sum_{1\le i_1<i_2<\dots<i_n\le N} p_{i_1}p_{i_2}\cdots p_{i_n} $$ is ($n!$ times) the elementary symmetric (Vieta) polynomial, which is Schur-concave for non-negative arguments. The latter can be easily checked with the help of the Schur-Ostrowski criterion: for $i\neq j$ $\frac{\partial}{\partial p_i} T^{\mathbf p} \le \frac{\partial}{\partial p_j} T^{\mathbf p}$ whenever $p_i\ge p_j$. This is pretty obvious if we restrict our attention to the case $p_1\ge p_2\ge \dots\ge p_N$.

Now it remains to note that any distribution with $p_1\ge p_2\ge \dots\ge p_N$ majorizes the uniform distribution on $\{1,2,\dots,N\}$.