Expected number of colors in a sampling of colored balls without replacement,

442 Views Asked by At

Suppose there is a box containing differently colored balls. There are $G$ colors, each color having $n$ balls of this color, i.e. $G \times n$ balls.

What is the expected number of different colors in a sample of size $s$ without replacement?

The problem is similar to Expected number of different colors, but without replacement.

In case of $G = 2$ it becomes pretty easy, so with $s$ values of 1 and 2. But in general case I can't seem to avoid double counting the combinations.

1

There are 1 best solutions below

1
On BEST ANSWER

We have from first principles that the PGF in $u$ with the coefficient on $[u^q]$ representing the probability of $q$ different colors / coupons not being seen in a sample of size $s$ is given by

$$\frac{1}{s!} {nG\choose s}^{-1} s! [z^s] \left(u + \sum_{k=1}^n \frac{n!}{(n-k)!} \frac{z^k}{k!}\right)^G.$$

This simplifies to

$${nG\choose s}^{-1} [z^s] \left(u+\sum_{k=1}^n {n\choose k} z^k\right)^G = {nG\choose s}^{-1} [z^s] (u-1+(1+z)^n)^G.$$

As a sanity check we indeed have on setting $u=1$

$$ {nG\choose s}^{-1} [z^s] (1+z)^{nG} = 1.$$

For example, with four colors and four instances each we get for six draws the distribution

$${16\choose 6}^{-1} [z^6] (u-1+(1+z)^4)^4 = {\frac {3\,{u}^{2}}{143}} +{\frac {60\,u}{143}}+{\frac {80}{143}}.$$

where e.g. the last term gives the probability that none of the colors are missing. We cannot have three colors missing because that leaves only one color to cover all six draws, we have only four instances, however. With this PGF we can answer the question about the probability that $q$ colors are missing in a draw of $s$ items, which is

$${nG\choose s}^{-1} [z^s] [u^q] (u-1+(1+z)^n)^G = {nG\choose s}^{-1} [z^s] {G\choose q} (-1+(1+z)^n)^{G-q} \\ = {nG\choose s}^{-1} [z^s] {G\choose q} \sum_{p=0}^{G-q} {G-q\choose p} (-1)^{G-q-p} (1+z)^{np}.$$

This yields for the probability $$\bbox[5px,border:2px solid #00A000]{ {nG\choose s}^{-1} {G\choose q} \sum_{p=0}^{G-q} {G-q\choose p} (-1)^{G-q-p} {np\choose s}}$$

which is inclusion-exclusion.

Returning to the main question we thus have for the expectation of coupons that did not occur

$${nG\choose s}^{-1} \left.\frac{\partial}{\partial u} [z^s] (u-1+(1+z)^n)^G \right|_{u=1} \\ = {nG\choose s}^{-1} [z^s] \left. G (u-1+(1+z)^n)^{G-1} \right|_{u=1} = {nG\choose s}^{-1} [z^s] G (1+z)^{n(G-1)}.$$

We get for the number of coupons that did occur

$$\bbox[5px,border:2px solid #00A000]{ G - G {nG\choose s}^{-1} {nG-n \choose s}.}$$

E.g. when we draw one coupon we obtain

$$G - G \frac{1}{nG} (nG-n) = G - G + G\frac{n}{nG} = 1$$

as expected. Also note that we obtain the value $G$ when $s\gt nG-n$ (second binomial coeffcient is zero). This is because the maximum coverage with $G-1$ colors is $nG-n$ and with the next sample we must use the last missing color.