Expected number of distinct elements when drawing from a multiset with replacement

125 Views Asked by At

Suppose I have a multiset $M$ with elements $1,\dots,m$ and respective multiplicities $n_1, \dots, n_m$: $$ (1, n_1) \\ (2, n_2) \\ \dots \\ (m, n_m) $$

What is the expected number of distinct elements in a sample of size $k$ (with replacement) from $M$?

2

There are 2 best solutions below

0
On BEST ANSWER

Let $X_i \in \{0,1\}$ be the indicator random variable that the sample contains element $i$.

The expected value $E(X_i)$ is the probability that the sample contains element $i$:

$$\begin{align*} E(X_i) &= \Pr(\text{the sample contains element }i)\\ &= 1-\left(1-\frac{n_i}{s}\right)^k \end{align*}$$

where $s$ is the size of the multiset including duplicates:

$$s = n_1 + n_2 + \cdots + n_m$$

Then the required expected number of distinct elements in a sample of size $k$ (with replacement) is

$$\begin{align*} E &= E\left(\sum_{i=1}^m X_i\right)\\ &= \sum_{i=1}^m E(X_i)\\ &= \sum_{i=1}^{m}\left[1-\left(1-\frac{n_i}{s}\right)^k\right]\\ &= m - \sum_{i=1}^{m}\left(1-\frac{n_i}{s}\right)^k \end{align*}$$

0
On

Hint:

What is the probability that element $1$ was selected at least once in your $k$ draws?

What is the probability that element $2$ was selected at least once in your $k$ draws?

Letting $X_1 = \begin{cases}0&\text{if }1\text{ was not selected}\\1&\text{if }1\text{ was selected at least once}\end{cases}$ and similarly defining $X_2,\dots, X_m$, how can you describe the random variable counting the number of distinct elements selected using these random variables?

$X = X_1+X_2+\dots+X_m$

Apply linearity of expected value.

$E[X]=E[X_1+X_2+\dots+X_m]=E[X_1]+E[X_2]+\dots+E[X_m]$