Suppose that we have an infinite supply of balls and N boxes. Fix $k \le N$. The balls are thrown at random into the boxes until exactly $k$ boxes are occupied. Let $X$ denote the number of throws needed for this. Calculate $E(X)$ and $\mathrm {Var} (X)$.
If for $i \geq k$, $X=i$ then that implies the success is obtained in the $i$-th trial. So it means that we have exactly $(k-1)$ occupied boxes in the $(i-1)$-th trial. Then the $i$-th ball is placed at some box other than the former $(k-1)$ boxes. How can we do that?
First we choose $(k-1)$ boxes from $N$ boxes. Then we put $(i-1)$ balls into those boxes such that no box is left empty. Then the remaining $i$-th ball can be placed in any of the remaining $(N-k+1)$ boxes.
So according to me we have for $i \geq k$
$$P(X=i) = \frac { {N \choose {k-1}}S(i-1,k-1){N-k+1\choose 1}} {N^i}.$$ Where $S(m,n)$ denotes the number of all surjective functions from an $m$ elements set onto an $n$ elements set.
But then the expressions of expectation and variance become very complicated. Is everything I have done so far correct? If 'yes' can somebody please tell me the easier way to calculate the expectation and variance of the above random variable $X$?
Thank you very much.
This is related to the coupon collector's problem. The time it takes to hit a new empty box when $j$ boxes are already occupied is geometrically distributed with success probability $1-\frac jN$. These times are independent. The expectation and variance of a geometric distribution with success probability $p$ are $\frac1p$ and $\frac{1-p}{p^2}$, respectively. The expectation and variance of the sum of independent random variables are the sum of the expectations and variances, respectively. Thus
\begin{eqnarray*} \mathsf E(X)=\sum_{j=0}^{k-1}\frac1{1-\frac jN}=N\sum_{j=0}^{k-1}\frac1{N-j} \end{eqnarray*}
and
$$ \mathsf{Var}(X)=\sum_{j=0}^{k-1}\frac{\frac jN}{\left(1-\frac jN\right)^2}=N\sum_{j=0}^{k-1}\frac j{(N-j)^2}\;. $$