Number of date collisions in birthday problem

45 Views Asked by At

If I generate uniform random integers from 1 to K and count how many unique numbers I get $n_\mathrm{unique}$, I empirically obtain:

  • the mean is: $\frac{2K}{\pi}$
  • the variance is $\frac{K}{\pi^{2}}$.

Matching these with the mean and variance formulas of a binomial distribution, I get:

$$p = 1 - 1 / (2 \pi)$$ $$n=\left\lfloor \frac{2K}{\pi-\frac{1}{2}}\right\rfloor$$

This works, but is not perfect.

Is there a way to derive the probability distribution of $n_\mathrm{unique}$?


Attempt

The first integer is always unique and added to the training set. The second has a probability 1/\left(K-1\right) to be added, and a probability 1/K of not being added.

Therefore probability is $$P_{2}(n)=\begin{cases}1/K & n=1\\(K-1)/K & n=2\end{cases}$$

Subsequently, we have two cases, either in iteration j nothing is added, which happens with probability ($n_{j}+1)/K$, where $n_{j}$ is the size of the set so far, or something is added, which happens with probability $$1-(n_{j}+1)/K=(K-n_{j}+1)/K$$.

Let's then consider the probability of obtaining k unique numbers: $$P(X=k)=\frac{K}{K}\times\frac{K-1}{K}\times\frac{K-2}{K}\times...\times\frac{K-k+1}{K}$$

This is wrong or at least incomplete.


Related problems

This is related to the generalised birthday problem and the number of expected collisions, previously discussed in these posts, which however obtain different formulas, possibly because they focus on the expected number of people with shared birthdays, not the number of shared birthdays:

The coupon's collector problem seems related, but this has the number of trials unbounded. I am essentially looking for the coupon's collector's success after n trials of collecting n coupons.