Expected number of rolls to select $K$ out of $N$ distinct numbers

90 Views Asked by At

I was thinking about a problem.

Lets say you have a dice with $N$ faces numbered from $1$ to $N$.
We begin with an empty set $S=\emptyset$
After each roll, we insert the number that turns up into the set $S$.

What is the expected number of rolls so that the size of set becomes $K$.


My Approach:
Let $f(x,y)$ be the expected number of rolls to select $x$ more distinct numbers when we already have selected $y$ distinct numbers.
$f(K,0)$ is the answer.
Its easy to see that $$f(x,y)=\left(\frac{y}{N}(1+f(x,y)\right) + \left(\frac{N-y}{N}(1+f(x-1,y+1)\right)$$

Doubt:
I know my approach will work, but I am having difficulty in finding the solution using Linearity of Expectation here.
I understand that the probability to select the $(n+1)^{th}$ distinct number will be $\frac{N-n}{N}$.
But why will the answer be $\sum_{n=1}^{K}\frac{N}{N-n}$?

2

There are 2 best solutions below

0
On

If $K=1$, the answer would be just $1$.

If $K=2$, the first attempt would grow the set size to $1$. To get another increment, we need to avoid the same number as the number in the set, which is a geometric distribution with probability $\frac{N-1}{N}$, the expectation is $\frac{N}{N-1}$. Hence, $\sum_{n=\color{red}0}^\color{red}{2-1} \frac{N}{N-n}$.

In general, suppose the existing set already has $k-1$ elements, to get another increment, we need to avoid the same numbers in the set, which is a geometric distribution with probability $\frac{N-K}{N}$, the expectation is $\frac{N}{N-K}$. Including the previous attempt, the total would be $$\sum_{n=0}^{K-1} \frac{N}{N-n}$$

1
On

Both approaches will work.


Your approach.

Note that you can rewrite your recursion as $$\frac{N-y}{N} f(x,y) = 1 + \frac{N-y}{N} f(x-1, y+1)$$

Starting from the base cases $f(0,y) = 0$ for any $y \in [0,N]$, you can use your recursion to show $f(1, y) = \frac{N}{N-y}$ for $y \in [0,N-1]$.

Similarly, you can use your recursion again to show $f(2,y) = \frac{N}{N-y} + \frac{N}{N-(y+1)}$.

You can continue in this manner to get $f(K,y) = \sum_{n=y}^{y+K-1} \frac{N}{N-n}$ for $y \in [0, N-K]$ and in particular $$f(K,0) = \sum_{n=0}^{K-1} \frac{N}{N-n}.$$


Linearity of expectation approach.

Let $X$ be the number of rolls needed to get $K$ out of $N$ numbers. Let $W_0=1$ be such that the $W_0$th roll yields the first distinct number, let $W_0+W_1$ be such that the $(W_0+W_1)$th roll yields the second distinct number, and so on, so that $X = W_0 + W_1 \cdots + W_{K-1}$. Linearity of expectation yields $$E[X] = E[W_0] + E[W_1] + \cdots + E[W_{K-1}].$$

You can show that $W_n$ is the number of independent trials needed to get a success, where the each probability of success is $\frac{N-n}{N}$. This is a geometric random variable with mean $E[W_n] = \frac{N}{N-n}$. Thus the answer is $$\sum_{n=0}^{K-1} \frac{N}{N-n}.$$


Final remarks: this is known as the coupon collector's problem.