There is likely a closed form solution for this problem but it's had me puzzled for days. This is about a variant on the classic birthday paradox. To recap, the birthday paradox is where given only 23 students the probability that two have the same birthday is > 0.5.
The variant is this: If each student has two (or K) special days, what is the probability that given N students each student has at least one special day that is unique to them.
I stumbled across this problem while working on a related cryptographic algorithm.
Another way of phrasing this is in terms of a lottery: If a lottery drawing consists of 5 random numbers (with repetition) drawn from a set of 100 numbers, after N drawings what is the probability that each drawing had at least one unique number among it's 5.
Explicitly, given these variables:
$C = $ the size of the set of choices (e.g. 100 for the lottery or 365 for the special days)
$K = $ # of random choices for each event (e.g. 5 for the lottery, 2 for the special days)
$N = $ # of events that have taken place (e.g. lottery drawings, or number of students)
Here is what I'm trying to solve for:
$P(C, K, N) = $ the probability that after N events, where each draws K random items from a set of C items, each event has at least one item that was not drawn in any other event.
Note that $P(C, 1, N)$ provides the solution to the classic birthday paradox.
To be clear, repetition is allowed. That is, a student could have the same special day twice, or the lottery drawing could be (5, 5, 78, 10, 5). In that event, if at least one of those numbers was never seen before, it counts as unique.
The best I've come up with is this:
Given a function, $Q(C, K, N)$ that computes the probability that the $N^{th}$ event has at least one unique value:
We would get these values in the student scenario:
$$ Q(2, 365, 0) = 1 - \frac{0}{365}\frac{0}{365} = 1.00000 $$ $$ Q(2, 365, 1) = 1 - \frac{2}{365}\frac{2}{365} = 0.99996 $$ $$ Q(2, 365, 2) = 1 - \frac{4}{365}\frac{4}{365} = 0.99987 $$
And similarly for the lottery drawing:
$$ Q(5, 100, 0) = 1 - \frac{0}{100}\frac{0}{100} = 1.00000 $$ $$ Q(5, 100, 1) = 1 - \frac{5}{100}\frac{5}{100} = 0.99999 $$ $$ Q(5, 100, 2) = 1 - \frac{10}{100}\frac{10}{100} = 0.99999 $$
This generalizes to:
$$ Q(C, K, N) = 1 - (\frac{N * K}{C})^K $$
If $Q$ computes the probability of any given student having at least one unique day than the probability of N events all having at least one unique day is just multiplying all of the probabilities together:
$$P(C, K, N) = \prod_{n=0}^N 1 - (\frac{n * K}{C})^{K}$$
This looks promising because for the case where $K = 1$, this simplifies to:
$$\prod_{n=0}^N \frac{C - n}{C}$$
which is the formula for the traditional birthday paradox.
The problem is that this proposed formula, beyond $K=1$, isn't actually correct, as verified by a monte-carlo simulation I've been running to compare it against.
I can't seem to wrap my head around what the problem is though. I've tried about a dozen different ways to approach the problem. It seems that there may be no closed form solution, unfortunately.
I think the formula is more like that:
$$ P(C,K,N)= \left(\prod_{n=0}^{N-1}\frac{C-n}{C}\right)\left(\frac{C-N+1}{C}\right)^{(K-1)*N} $$
Intuitively, you have to choose the $N$ draw that will be unique. There are $C$ possibilities among $C$ for the first $C-1$ among $C$ for the second and so on (hence the $\prod_{n=0}^{N-1} \frac{C-n}{C}$).
Ten you have to choose the other draws. You cannot pick the unique one (unless the one the is in the same event as you) hence there is $C-N+1$ choices possible among $C$ and there are $(K-1)*N$ such draw (hence the $\left(\frac{C-N+1}{C}\right)^{(K-1)*N}$).