Probability distribution of number of unique coupons after multiple draws

324 Views Asked by At

Suppose there is a pile of coupons. It contains $N$ different kinds of coupons (there may be one or more of each kind).

I draw $k$ coupons at random ($k<N$). I look at the coupons and see that I obtained $n$ unique coupons. I then sort the coupons by kind, count them, and write a function $ f $ where $c_i$ is the $i$th coupon ($1\le i\le N$) and $q=f(c_i)$ is the number of coupons of kind $c_i$ that I got.

For many $c_i$, $f(c_i)$ will be 1. For some it will be more than one. It is guaranteed (because $k<N$) that at least for some $c_i$, $f(c_i)=0$. I then write $g$ such that $y=g(x)$ is the number of coupon kinds $i$ for which $f(c_i)=x$.

Given $k$ and $N$, what can we say regarding $g(x)$? Specifically, can it be derived analytically?

1

There are 1 best solutions below

2
On

From the given assumption, I assume there is infinitely many coupons (or equivalently there is at least $N$ coupons for each kind, and draw with replacement) and the draws are independent.

In the following I change the notation to facilitate the presentation.

Let $C_i$ be the number of coupons drawn for the $i$-th kind of coupon. Then those $ C_i $ will jointly follow a multinomial distribution:

$$ (C_1, C_2, \ldots, C_N) \sim \text{Multinomial} \left(k; \frac {1} {N}, \frac {1} {N}, \ldots, \frac {1} {N}\right) $$

Let $Y_j$ be the number of kinds of coupons that has exactly $j$ of them being drawn. In order to observe $Y_j \geq y$, one need to first select $y$ out of $N$ kinds of coupon, and distribute $j$ coupons from each kind, for a total of $yj$ coupons, if $yj \leq k$ (otherwise it will be impossible and the probability is $0$). The remaining $k - yj$ coupons will be distribute without restrictions among the remaining $N - y$ kinds of coupons. So

$$ \Pr\{Y_j \geq y\} = \binom {N} {y} \frac {k!} {(j!)^y(k-jy)!} \frac {(N - y)^{k-jy}} {N^k}, j, y = 0, 1, \ldots, k; jy \leq k$$

The pmf can be obtained by successive difference of this survival function.