What is this variation of the batched coupon collector?

67 Views Asked by At

I guess I will not be the first programmer bumping into another variation of the Coupon Collector Problem: please forgive my ignorance!

Here is my understanding of the problem at hand:

Let $S$ be a set of $N$ different kinds of coupons.

Assume that the coupons are shipped into random packs of at most 1 coupon of each type: that is each pack is a sequence $a_1,...,a_N$ such as $0 \leqslant a_i \leqslant 1$.

Assume the frequency $f_i$ of these 2^N packs are known but may differ, $\sum_{i=1}^{2^N}{f_i}=1$

Assume a person wants to collect all the $g$ different kinds of coupons, by randomly picking packs.

I am interested in two quantities:

  1. What is the probability one person can collect all $N$ coupon kinds by drawing only 2 packs?
  2. What is the probability that two persons get the exact same final, possibly incomplete, collection by drawing 2 (possibly different) packs?

I have seen proofs of the batched version by inclusion/exclusion, (eg here) that I could possibly use and adapt, but -being a dumb programer- I would be very interested by a pointer to an existing solution to this exact (weighted) problem?

1

There are 1 best solutions below

4
On BEST ANSWER

This isn’t actually a variation of the coupon collector’s, since you’re only ever drawing $2$ packs and not waiting to complete a collection.

The first probability is

$$ \sum_{i\cup j=[N]}f_if_j\;, $$

where the indices $i,j$ run over all subsets of $[N]=\{1,\ldots,N\}$.

The second probability is

$$ \sum_{k\subset[N]}\left(\sum_{i\cup j=k}f_if_j\right)^2\;. $$

Performing this sum directly would take $O\left(4^N\right)$ time. You can reduce that to $O\left(3^N\right)$ like this: Represent $i$ and $j$ as binary $N$-tuples with the values $0$ (“excluded”) and $1$ (“included”) and consider ternary $N$-tuples that can additionally have the value $2$ (“don’t care”). There are $3^N$ such tuples. For each such tuple $k$, define $f_k$ as the sum of the $f_i$ for all $i$ that match $k$ where $k$ is $0$ or $1$ (i.e. that exclude all the excluded coupons and include all the included coupons, irrespective of whether they include the “don’t care” coupons). You can efficiently compute these $f_k$ recursively, for each $2$ component adding the corresponding values for $0$ and $1$. Now for each $j$, instead of pairing it will all $i$, you only have to pair it with all $k$ that have “don’t care” where $j$ has a $1$, since those values don’t affect the union with $j$. If $j$ has $\ell$ $1$s, there are $2^{N-\ell}$ such $k$ to pair it with, and there are $\binom N\ell$ values of $j$ with $\ell$ $1$s, so the total number of pairs to add up is

$$ \sum_{\ell=0}^N\binom N\ell2^{N-\ell}=3^N\;. $$

I doubt this can be further reduced to $O\left(2^N\right)$, but I might be wrong.