Drawing colored balls from an urn until all colors have been picked

297 Views Asked by At

Problem statement

Assume we have an urn with $n$ balls in $n$ colors. (for $n=3$ you would have 3 blue, 3 red and 3 green balls in the urn). We pull a ball, check the color, and place it back into the urn. We keep drawing balls until we have picked a ball in each color.

On average (expectancy value) how many times do we draw a ball in a color we have already picked?

Attempt

Unsure where to start we wrote a Python script attempting to solve this problem

import numpy as np
import numpy.random

for n_items in range(3, 7):
    tracker = 0

    runnum = 10 ** 5
    for i in range(runnum):
        Sum = 0
        checkvec = np.zeros(n_items)
        tracker -= n_items
        while Sum < n_items:
            item = np.random.randint(0, n_items)
            checkvec[item] = 1
            Sum = sum(checkvec)
            tracker += 1

    print(n_items, round(100 * (tracker / runnum)) / 100)

Giving the following table

n Extra draws required
1 0
2 1
3 2.5
4 4+1/3

I was unable to find a closed form for this sequence, but some regression analysis led me to the following approximation

$$ B(n) = 0.1571 n^2 + 0.662 n - 0.872 $$

Does there exists a closed form for this problem?

2

There are 2 best solutions below

0
On BEST ANSWER

This is the standard version of the Coupon Collector's Problem, where each colour has an equal probability of getting picked, and you want to know the expected number of draws until you collect every colour.

If the probability of getting a new colour in one draw is $p$, then the expected number of draws until you get that new colour is $1/p$. The probability of getting a new colour starts at $\frac{n}{n}$, and each time the numerator decreases as you collect another colour, so it then goes $\frac{n-1}{n}$, $\frac{n-2}{n}$, ... $\frac{1}{n}$.

The total expected number of draws is therefore $$\frac{n}{n} + \frac{n}{n-1} + ... + \frac{n}{1}$$

which can be rearranged as

$$n \left(\frac{1}{1} + \frac{1}{2} + ... + \frac{1}{n}\right) = nH_n$$

where $H_n$ is the $n$th harmonic number.

If you exclude the $n$ draws in which you get a new colour you get the number of extra draws, which is $nH_n - n = n(H_n-1)$.

2
On

Thanks to N3buchadnezzar for pointing out a mistake in my original answer; see his comment following this answer. Mistake repaired.


Hint

The following analysis, which is based on $n=3$, can be extended to any value of $n$. That is, for each value of $n$, you simply work backwards, as I have done here. If you do this manually, for $n=4$ and $n=5$, you should see a clear pattern in the final expression.


Intermediate results - for $|x| < 1$:

  • $\sum_{k=0}^\infty x^k = \frac{1}{1-x}.$

  • $\sum_{k=0}^\infty (k+1)x^k = \frac{1}{(1-x)^2}.$

The second expression above can be computed by noting that if
$\displaystyle F'(x) = f(x) = \sum_{k=0}^\infty (k+1)x^k$, then
$\displaystyle F(x) = \sum_{k=0}^\infty x^{(k+1)} = \frac{x}{1-x}.$

Therefore $\displaystyle F'(x) = f(x) = \frac{1}{(1-x)^2}.$


Once you reach the point where exactly $2$ colors have been drawn, then from that point forward, the expected number of draws until the 3rd color is drawn is

$$(1/3) \times \sum_{k=0}^\infty \color{red}{(k+1)}(2/3)^k.$$

So, the problem reduces to identifying the expected number of draws, until exactly two colors have been selected.

Assume, without loss of generality, that the first ball drawn is red, and the other two colors are green and blue.

The exact moment when you first draw a non-red ball, you will have succeeded in drawing two colors.

The expected number of draws, starting with draw #2, until you draw a non-red ball is

$$(2/3) \times \sum_{k=0}^\infty \color{red}{(k+1)}(1/3)^k.$$