Expected value of packs for getting a single instance of each collectible

101 Views Asked by At

I've tried to find a question that solves this problem (or similar) in the community, but I haven't been able to find an answer (maybe it's because I don't know the precise wording of some of the mathematical terms).

Let's imagine that there is a set of collectibles of size n, where each one of these is different from one another. These collectibles are placed inside boxes in a way that 1 box contains exactly 1 collectible. Let's asume that are uniformily distributed.

The question is:

How many boxes do I have to buy on average in order to get at least one of each instance of the collectibles?

To give more context, I found this problem in an Android app and they hint that I can use the linearity of the expectation. However, this is confusing to me. I understand that the probability of obtaining a single instance is 1/n without depending on past results. I guess on average I need to buy n boxes to get that single instance. However, my "mistakes" in trying to get that single instance must be taking into account for the attempt of getting the entire collection. What am I missing here?

1

There are 1 best solutions below

0
On BEST ANSWER

As you suspect, the answer of needing to buy $n$ boxes on average is incorrect. This is easily seen by considering that needing $n$ boxes is actually the best case scenario, but that if you're unlucky, you may need $2n$ boxes, or $3n$ boxes, etc.

This is the so-called Coupon Collector Problem. Let $X_k$ denote the number of boxes needed to from having obtained $n-1$ distinct items to having $k$ items. We immediately see that $\mathbb E[X_1]= 1$, and that $\mathbb E[X_2] = \frac{k}{k-1}$; once you have obtained your first item, the chances of getting a new one are $\frac{k-1}{k},$ and if you repeatedly open boxes to make this happen then you have a geometric variable with $p = \frac{k-1}{k}$ whose expectation is therefore $1/p = \frac{k}{k-1}$. Similarly, we see that $\mathbb E[X_3] = \frac{k}{k-2}$, and more generally, $\mathbb E[X_k] = \frac{n}{n-k+1}$.

We are interested in the expected time to obtain all the distinct items -- that is, $$\mathbb E[X_1 + \dots + X_n] = \mathbb E[X_1] + \dots + \mathbb E[X_n]$$ by linearity of expectation. I'll omit the rest of the details so there's still something to do.

EDIT: I should also address your "what am I missing here?" question more directly. Number the toys $1$ throught $n$, and let $T_k$ denote the number of boxes you need to open to obtain toy number $k$. If $T$ is the time to complete the set, then it is not true that $T = T_1 + \dots + T_k$. (It is true, however, that $\mathbb E[T_k] = 1/n$.)