Expected Distribution of the Outcome Counts When Sampling from a Categorical Random Variable

77 Views Asked by At

From a categorical (multinoulli) distribution with $M$ possible outcomes of arbitrary probability, $N$ samples are drawn with replacement (where $N \gg M$). Each outcome $1\leq i \leq M$ occurs $0 \leq x_i \leq N$ times where $\sum_{i=1}^{M} x_i = N$.

Consider the distribution of the $x_i$'s. Specifically, the quantity of interest is the expected number of $x_i$'s that equal a specific value. For example, what is the expected number of $x_i$'s that equal $k$? If $k=0$, the expected value must be less than $N$. Likewise, if $k=N$, the expected number of $x_i$'s must be less than or equal to $1$. How would this expected count be calculated for an arbitrary $k$?

2

There are 2 best solutions below

0
On BEST ANSWER

Let $Y_i$ take value $1$ if $x_i=k$ and value $0$ otherwise.

Then $Y=Y_1+\cdots+Y_M$ is the cardinality of $\{i\in\{1,\dots,M\}\mid x_i=k\}$ and with linearity of expectation we find that:$$\mathbb EY=\mathbb EY_1+\cdots+\mathbb EY_M=\sum_{i=1}^{M}P(x_i=k)$$

If the probability of an outcome $i$ is $p_i$ and $q_i:=1-p_i$ then this leads to:$$\mathbb EY=\binom{N}{k}\sum_{i=1}^M p_i^{k}q_i^{N-k}$$

0
On

If the probability that a particular draw gives a particular value $m$ im is $p_m$ with $\sum\limits_{m=1}^M p_m = 1$

then the probability that $N$ draws give exactly $k$ copies of value $y$ is ${N \choose k}p_m^k(1-p_m)^{N-k}$

and thus the expected number of values appearing $k$ times is $M$ times this, i.e. $$\sum_{m=1}^{M} {N \choose k}p_m^k(1-p_m)^{N-k}$$