Statistics find E(x), the number of distinct elements in uniformly distributed pool of items

219 Views Asked by At

Question:

Suppose there are Y types of balls in a bucket, which are normally distributed and independent. Hence the probability of picking one type out is $\frac{1}{Y}$. Let $x$ be the number of different balls taken. What would the $E(x)$ and $Var(x)$ be?

For example, say there are 5 types of balls, $A,B,C,D$ and $E$. We take out $4$ balls = $ACCD$. Then $x = 3$ types of balls.

1

There are 1 best solutions below

7
On BEST ANSWER

I take the question to mean: We have $Y$ balls in a bucket and we sample $k$ balls with replacement; what is the expected value of the number $x$ of different balls sampled? (I assume that "are normally distributed and independent", which doesn't seem to make any sense here, is meant to express that the samples are independent and each sample has uniform distribution over the balls; this has nothing to do with a normal distribution.)

The probability of a single ball being sampled is $1-\left(1-\frac1Y\right)^k$, so the expected number $E[x]$ of balls sampled is $Y\left(1-\left(1-\frac1Y\right)^k\right)$. Denoting by $x_i$ the indicator variable that takes the value $1$ if the $i$-th ball is sampled and $0$ if it isn't, we have $x=\sum x_i$ and thus

$$ E\left[x^2\right]=E\left[\left(\sum_ix_i\right)^2\right]=E\left[Yx_1^2+Y(Y-1)x_1x_2\right]=E[x]+Y(Y-1)E[x_1x_2]^\;. $$

We can get $E[x_1x_2]$, the probability that two particular balls were sampled, by inclusion-exclusion: $E[x_1x_2]=1-\left(1-\frac1Y\right)^k-\left(1-\frac1Y\right)^k+\left(1-\frac2Y\right)^k$. So the variance is

\begin{align} \operatorname{Var}[x]&=E[x^2]-E[x]^2\\ &=Y(Y-1)\left(1-\left(1-\frac1Y\right)^k-\left(1-\frac1Y\right)^k+\left(1-\frac2Y\right)^k\right)+Y\left(1-\left(1-\frac1Y\right)^k\right)\\ &\quad\quad-Y^2\left(1-\left(1-\frac1Y\right)^k\right)^2\;. \end{align}