If randomly picked from $m$ symbols, each symbol being equally likely, how long on average would it take to pick $m/2$ different symbols?

171 Views Asked by At

Lets take the specific example of the English alphabet. In this case, I have 26 letters (symbols). Lets say these letters are plastic toys in a box. At random, I take one toy letter from the box, and write it down on a piece of paper. I then put that toy letter back into the box, and repeat the process. How many times, on average, would I have to repeat this process for my piece of paper to contain 13 of the 26 letters of the alphabet?

Is there a way to generalize the answer to $m$ symbols instead of 26? What happens when $m$ is odd?

1

There are 1 best solutions below

2
On BEST ANSWER

As Ross Millikan points out, this is called the coupon collector problem. The calculation isn't hard.

The first pick from the box of $26$ is guaranteed to produce a letter that you haven't drawn already. To get the second such letter requires on average $\frac{26}{25}$ picks — close to one more, but a little bit more than that because you have a small chance of drawing the same letter as you did the first time. To get the third distinct letter requires on average $\frac{26}{24}$ picks, and so on.

To pick at least $k$ different elements from a box containing $n$ different elements one expects to need a total of $$1 + \frac n{n-1} + \frac n{n-2} + \ldots + \frac n{n-k+1} = n\sum_{i=1}^k \frac{1}{n-i+1}$$ picks.

For your example of $n=26, k=13$, this is approximately $17.5$ picks.

You wanted to know what happens when $k=\frac n2$. When $n$ is not too small the number of picks required to get about $\frac n2$ different items can be shown to be approximately $$0.69n$$ where $0.69\approx \log 2$. (To show this, write the formula above as $$n\sum_{i=1}^k \frac{1}{n-i+1} = n(H_n-H_{n-k})$$ where $H_i$ are the harmonic numbers, and use the approximation $H_n\approx \log n + \gamma$.)