Expected value of the number of distinct results of die rolls in $N$ trials

3.4k Views Asked by At

Given $N$ trials of a die roll, where we have defined $D$ as the number of distinct outcomes, what would be the mean and standard deviation of $D$?

If we have defined $I(k)$ as an indicator random variable which equals 1 if outcome $k$ (such as 6) appears at least once, and 0 otherwise, for $k\in\{ 1,\dots,6\}$, then by definition $$D = \sum\limits_{k=1}^6 I(k)$$ How do the dependencies between the $I(k)$ play into the solution? (Which is the part that is tripping me up the most.)

3

There are 3 best solutions below

3
On BEST ANSWER

We approach this problem from a combinatorial perspective. The number of $n$-roll sequences with $k$ distinct values ($1\le k\le6$), out of $6^n$ sequences total, is $$D(n,k)=\binom6kk!\left\{n\atop k\right\}=\binom6k\sum_{j=0}^k(-1)^{k-j}\binom kjj^n$$ where $\left\{n\atop k\right\}$ is the Stirling number of the second kind and counts the number of ways to partition the rolls into homogeneous subsets, and $\binom6kk!$ is the number of ways to fill those subsets with dice rolls. Letting $n$ vary across the positive integers we get $$D(n,1)=6$$ $$D(n,2)=15\cdot2^n-30$$ $$D(n,3)=-60\cdot2^n+20\cdot3^n+60$$ $$D(n,4)=90\cdot2^n-60\cdot3^n+15\cdot4^n-60$$ $$D(n,5)=-60\cdot2^n+60\cdot3^n-30\cdot4^n+6\cdot5^n+30$$ $$D(n,6)=15\cdot2^n-20\cdot3^n+15\cdot4^n-6\cdot5^n+6^n-6$$ $\frac{D(n,k)}{6^n}$ then gives the probability an $n$-roll sequence will have $k$ distinct values. The expected value of $D$ for a given $n$ is then $$\mu_n=\sum_{k=1}^6k\cdot\frac{D(n,k)}{6^n}=6\left(1-\left(\frac56\right)^n\right)$$ and the standard deviation is $$\sigma_n=\sqrt{\sum_{k=1}^6\frac{D(n,k)}{6^n}(k-\mu_n)^2}=\sqrt{\frac{5\cdot144^n-6\cdot150^n+180^n}{6^{3n-1}}}$$ Note that $$\lim_{n\to\infty}\mu_n=6\text{ and }\lim_{n\to\infty}\sigma_n=0$$ which match our intuition, since for large $n$ almost all roll sequences should contain all six outcomes.

The SymPy code that generated these results can be found here.

10
On

Good tactic.   The distributions of the indicators are identical, though not independent, as you noticed.   Fortunately that is not a critical issue.

A really useful though counterintuitive, property of Expectation is that it is Linear and this holds whether the random variables are independent or not.   So the expectation of the count of distinct results is the sum of the expectation of these indicator random variables.

$$\begin{align}\mathsf E(D) ~&=~ \sum_{k=1}^6 \mathsf E(I(k))\\[1ex] & =~ 6~\mathsf E(I(1))\end{align}$$

Unfortunately this does not quite apply to variance; however, Covariance is Bilinear, which is almost as useful.

$$\begin{align}\mathsf {Var}(D) ~&=~ \sum_{k=1}^6 \mathsf {Var}(I(k))~+ \mathop{2~~\sum}\limits_{k,j~:~1\leq k < j\leq 6}\mathsf {Cov}(I(k), I(j))\\[1ex] &=~ 6~\mathsf {Var}(I(1)) + 30~\mathsf{Cov}(I(1),I(2)) \end{align}$$

You can also use the definition:

$$\begin{align}\mathsf{Var}(D) ~&=~ \mathsf E\left(\left(\sum_{k=1}^6 I(k)\right)^2\right)-\left(\mathsf E\left(\sum_{k=1}^6 I(k)\right)\right)^2 \\[1ex] &=~ 6~\mathsf E\left(I(1)^2\right) + 30~\mathsf E(I(1)I(2))- 36~\mathsf E(I(1))^2 \end{align}$$

So find $\mathsf E(I(1)), \mathsf {Var}(I(1)), \text{ and }\mathsf {Cov}(I(1),I(2))$ and apply them.

Shall we leave that up to you?


You should have for all supported $k$ that $\mathsf E(I(k)^2)=\mathsf E(I(k)) = 1-(\tfrac 56)^N$, and for all supported $j\neq k$ that $\mathsf E(I(j)\cdot I(k))=1-2(\tfrac 56)^N+(\tfrac 46)^N$.

0
On

I just wanted to give another way to answer the question. Instead of partitioning by whether the number $i$, $1 \leq i \leq 6$, has appeared, one can consider $X_i$ to be the indicator random variable corresponding to whether there is an extra distinct unique value because of roll $i$. Hence, $D=X_1 + ... + X_n$.

We have $E(D)=\sum E(X_i)$ and $E(X_i)=\left(\frac{5}{6}\right)^{i-1}$ since there is an extra distinct value if and only if all the previous values are not the same value as roll $i$. Hence $E(D)=\sum_{i=1}^n \left(\frac{5}{6}\right)^{i-1} = \frac{1-\left(\frac{5}{6}\right)^n}{\frac{1}{6}}$.

As for the variance/standard deviation:

$\begin{align} var(D)&=E((X_1+...+X_n)^2)-E(D)^2\\ &=E(X_1^2+...+X_n^2) + 2\sum_{i<j}E(X_iX_j)-E(D)^2 \\ &=E(D)+2\sum_{i=1}^n\sum_{k=1}^{n-i}E(X_iX_{i+k})-E(D)^2 \end{align} $

Hence this boils down to finding $E(X_iX_{i+k})$. Let $A$ be the event that value of roll $i$ not equal value of roll $i+k$}. Since $$E(X_iX_{i+k} | A)=\left(\frac{4}{6}\right)^{i-1} \left(\frac{5}{6}\right)^{k-1}$$ and

$ \begin{align} E(X_iX_{i+k}) &= E(X_iX_{i+k} | A)P(A) + E(X_iX_{i+k} | A^c)P(A^c)\\ &= \left(\frac{4}{6}\right)^{i-1} \left(\frac{5}{6}\right)^{k-1} \left(\frac{5}{6}\right) + 0 \end{align} $

, the answer follows.