Pick two sequences of numbers, $S_1$ and $S_2$. $S_1$ is $n_1$ picks from $1$ to $k$, $S_2$ is $n_2$ picks from $1$ to $k$. There could be duplicates within each sequence, for instance $S_1$ might contain the number $471$ twice.
If $S_1$ contains $471$ twice and $S_2$ doesn't contain it, there is no match. If $S_2$ contains $471$ once, there is only one match. If $S_2$ also contains $471$ twice or more, there are two matches. A number from each sequence will match if it's the same as a number in other sequence (order is not important) that is not already matched.
What is the probability $p$ that the two sequences will have exactly $m$ matches between them? Obviously $n_1 \ge m \le n_2$, otherwise $p$ is always $0$.
I can work it out manually for low values of $m$, but once it gets to $3$ I get stuck. It sounds similar to the Birthday Problem.
If $k = 10$, $n_1 = 2$ and $n_2 = 2$, the results are as follows:
| $m$ | $p$ |
|---|---|
| $0$ | $0.657$ |
| $1$ | $0.324$ |
| $2$ | $0.019$ |
But it's complicated to work out by hand.
For instance, if $k = 1000$, $n_1 = 99$, $n_2 = 101$, and $m = 10$, what is the probability $p$?
The calculations indeed look complicated, especially the minimum-of-two element. You could use simulation. For example using R
gives
close to your calculations with some simulation noise, while
gives
suggesting the probability for $m=10$ is about $0.13$