I guess I will not be the first programmer bumping into another variation of the Coupon Collector Problem: please forgive my ignorance!
Here is my understanding of the problem at hand:
Let $S$ be a set of $N$ different kinds of coupons.
Assume that the coupons are shipped into random packs of at most 1 coupon of each type: that is each pack is a sequence $a_1,...,a_N$ such as $0 \leqslant a_i \leqslant 1$.
Assume the frequency $f_i$ of these 2^N packs are known but may differ, $\sum_{i=1}^{2^N}{f_i}=1$
Assume a person wants to collect all the $g$ different kinds of coupons, by randomly picking packs.
I am interested in two quantities:
- What is the probability one person can collect all $N$ coupon kinds by drawing only 2 packs?
- What is the probability that two persons get the exact same final, possibly incomplete, collection by drawing 2 (possibly different) packs?
I have seen proofs of the batched version by inclusion/exclusion, (eg here) that I could possibly use and adapt, but -being a dumb programer- I would be very interested by a pointer to an existing solution to this exact (weighted) problem?
This isn’t actually a variation of the coupon collector’s, since you’re only ever drawing $2$ packs and not waiting to complete a collection.
The first probability is
$$ \sum_{i\cup j=[N]}f_if_j\;, $$
where the indices $i,j$ run over all subsets of $[N]=\{1,\ldots,N\}$.
The second probability is
$$ \sum_{k\subset[N]}\left(\sum_{i\cup j=k}f_if_j\right)^2\;. $$
Performing this sum directly would take $O\left(4^N\right)$ time. You can reduce that to $O\left(3^N\right)$ like this: Represent $i$ and $j$ as binary $N$-tuples with the values $0$ (“excluded”) and $1$ (“included”) and consider ternary $N$-tuples that can additionally have the value $2$ (“don’t care”). There are $3^N$ such tuples. For each such tuple $k$, define $f_k$ as the sum of the $f_i$ for all $i$ that match $k$ where $k$ is $0$ or $1$ (i.e. that exclude all the excluded coupons and include all the included coupons, irrespective of whether they include the “don’t care” coupons). You can efficiently compute these $f_k$ recursively, for each $2$ component adding the corresponding values for $0$ and $1$. Now for each $j$, instead of pairing it will all $i$, you only have to pair it with all $k$ that have “don’t care” where $j$ has a $1$, since those values don’t affect the union with $j$. If $j$ has $\ell$ $1$s, there are $2^{N-\ell}$ such $k$ to pair it with, and there are $\binom N\ell$ values of $j$ with $\ell$ $1$s, so the total number of pairs to add up is
$$ \sum_{\ell=0}^N\binom N\ell2^{N-\ell}=3^N\;. $$
I doubt this can be further reduced to $O\left(2^N\right)$, but I might be wrong.