Expected number of ball-in-hat draws to complete the set

231 Views Asked by At

Suppose you have a hat containing 19 balls.

In the hat there is 1 red ball, 2 blue balls, 2 yellows, 2 greens, 2 oranges, 2 purple, and 8 white balls. Essentially there are 7 colors where one color comprises 1/19 of the total balls, 5 colors each comprise 2/19, and the final color comprises 8/19.

Suppose further that you continuously pick a random ball from the hat and then immediately put it back.

Every time you draw a ball, you add its color to a Set if the Set doesn't yet contain the color. What is the expected number of draws from the hat in order to arrive at the set of all colors, such that you've drawn every color at least once? How do you derive this value mathematically?

I ran a simulation to arrive at an approximation that I'm confident in, however, I can not figure out how to arrive at that value only using mathematics.

1

There are 1 best solutions below

2
On

I have found a paper on solving this problem on JSTOR. For those without database access, here's the summary of their results:

Let $P_n$ be the probability that all $7$ different color have been drawn after $n$ draws. Define $$E_n(X) = n[P_n - P_{n-1}] + (n-1)[P_{n-1} - P_{n-2}] + + \ldots + 8[P_8 - P_7] + 7P_7$$ $P_n - P_{n-1}$ measures the probability that the full set is completed on draw $n$, so we take the weighted sum to get the expected value of the number of draws. Thus, our true expected value $E(X) = \lim_{n\to\infty} E_n(X)$. Simplifying, $$E_n(X) = nP_n - \sum_{i=7}^{n-1}P_i$$ Let $p_j$ represent the probability of getting color $j$ on any given draw. Expanding, $$E_n(X) = nP_n - \sum_{i=7}^{n-1}\left((p_1 + \ldots + p_7)^i - \sum_{\sigma}(p_1 + \ldots p_6)^i + \ldots - \sum_{\sigma}(p_1+p_2)^i + \sum_{\sigma}p_1^i\right)$$ where I'm repurposing notation and letting $\sum_\sigma$ represent the sum over all non-symmetric permtuations. For example, $\sum_{\sigma}(p_1 + p_2)^i = (p_1 + p_2)^i + (p_1 + p_3)^i + \ldots (p_1 + p_7)^i + (p_2 + p_3)^i + (p_2 + p_4)^i + \ldots$

Taking the infinite geometric sum, our expected value becomes $$E(X) = 7 + \sum_{\sigma}\frac{(p_1 + p_2 + \ldots + p_6)^7}{p_7} - \sum_{\sigma} \frac{(p_1 + p_2 + \ldots + p_5)^7}{p_6 + p_7} + \sum_{\sigma}\frac{(p_1+p_2+\ldots p_4)^5}{p_5 + p_6 + p_7}\ldots - \sum_{\sigma}\frac{p_i^7}{p_2 + p_3 + \ldots + p_7}$$ which you can evaluate by plugging in your particular probabilities.