In a random sampling with replacement from a population with $N$ distinct elements, let $X$ be the number of distinct elements included in the sample.If $n$ be the sample size, find $E(X)$ and $V(X)$
What I attempted
I am not sure whether my approach is correct or not.
I think minimum value $X$ can take is $1$, that is, all the elements of the sample is same. Since there are $N$ elements in the population the required probability is $\frac{N}{N^n}$
Similarly, $P(X=2)=\frac{N(N-1)}{N^n}$
$P(X=3)=\frac{N(N-1)(N-2)}{N^n}$
(Proceeding in this way), I have
$P(X=n)=\frac{N(N-1)(N-2).............(N-n+1)}{N^n}$
Therefore, $E(X)=1.\frac{N}{N^n}+2.\frac{N(N-1)}{N^n}+...........+n.\frac{N(N-1)(N-2)}{N^n}=\frac{1}{N^n}(\sum_{k=1}^{n} k. ^NP_k)$
However, after the examination, some of my classmates pointed out that they got something different. Their answer was $E(X)=N[1-(\frac{N-1}{N})^n]$
One of them asked me why I did not use the recurrence relation. I was not able to understand what he was meaning.
Is my approach correct? Have I made any mistake?
When counting random things, it is often powerful to adopt the method of indicator functions.
Let $Y_i$ be the $i$-th sample, and let $X_n$ be the number of distinct samples up to the $n$-th sampling. If we denote the indicator function of the event $A$ by $\mathbf{1}[A]$, then
\begin{align*} X_n &= X_{n-1} + \mathbf{1}[Y_n \notin \{Y_1, \cdots, Y_{n-1}\}] \\ &= \sum_{i=1}^{n} \mathbf{1}[Y_i \notin \{ Y_1, \cdots, Y_{i-1} \}]. \end{align*}
Now given the size $X_{n-1}$ of the set $\{Y_1, \cdots, Y_{n-1}\}$, we compute the following probability
$$ \mathsf{P}[Y_n \notin \{Y_1, \cdots, Y_{n-1}\} \mid X_{n-1}] = 1 - \frac{X_{n-1}}{N}. $$
So it follows that $\mathsf{E}[X_1] = 1$ and
$$ \mathsf{E}[X_n] = \mathsf{E}[X_{n-1}] + \mathsf{E}\left[ 1 - \frac{X_{n-1}}{N} \right] = 1 + \left( 1 - \frac{1}{N} \right) \mathsf{E}[X_{n-1}].$$
Solving this recurrence relation yields
$$ \mathsf{E}[X_n] = \sum_{k=0}^{n-1} \left( 1 - \frac{1}{N} \right)^k = N \left( 1 - \left(1 - \frac{1}{N}\right)^n \right) $$
as claimed by the classmates. $\mathsf{E}[X_n^2]$ can be computed in a similar way, from which you can compute the variance as well.