Expected value of the number of distinct results of die rolls in $N$ trials

Question

Expected value of the number of distinct results of die rolls in $N$ trials

3.4k Views Asked by Bumbble Comm At 29 Mar 2026 - 7:14

Given $N$ trials of a die roll, where we have defined $D$ as the number of distinct outcomes, what would be the mean and standard deviation of $D$?

If we have defined $I(k)$ as an indicator random variable which equals 1 if outcome $k$ (such as 6) appears at least once, and 0 otherwise, for $k\in\{ 1,\dots,6\}$, then by definition $$D = \sum\limits_{k=1}^6 I(k)$$ How do the dependencies between the $I(k)$ play into the solution? (Which is the part that is tripping me up the most.)

Original Q&A

There are 3 best solutions below

Bumbble Comm On 10 Feb 2017 - 3:40

Good tactic. The distributions of the indicators are identical, though not independent, as you noticed. Fortunately that is not a critical issue.

A really useful though counterintuitive, property of Expectation is that it is Linear and this holds whether the random variables are independent or not. So the expectation of the count of distinct results is the sum of the expectation of these indicator random variables.

$$\begin{align}\mathsf E(D) ~&=~ \sum_{k=1}^6 \mathsf E(I(k))\\[1ex] & =~ 6~\mathsf E(I(1))\end{align}$$

Unfortunately this does not quite apply to variance; however, Covariance is Bilinear, which is almost as useful.

$$\begin{align}\mathsf {Var}(D) ~&=~ \sum_{k=1}^6 \mathsf {Var}(I(k))~+ \mathop{2~~\sum}\limits_{k,j~:~1\leq k < j\leq 6}\mathsf {Cov}(I(k), I(j))\\[1ex] &=~ 6~\mathsf {Var}(I(1)) + 30~\mathsf{Cov}(I(1),I(2)) \end{align}$$

You can also use the definition:

$$\begin{align}\mathsf{Var}(D) ~&=~ \mathsf E\left(\left(\sum_{k=1}^6 I(k)\right)^2\right)-\left(\mathsf E\left(\sum_{k=1}^6 I(k)\right)\right)^2 \\[1ex] &=~ 6~\mathsf E\left(I(1)^2\right) + 30~\mathsf E(I(1)I(2))- 36~\mathsf E(I(1))^2 \end{align}$$

So find $\mathsf E(I(1)), \mathsf {Var}(I(1)), \text{ and }\mathsf {Cov}(I(1),I(2))$ and apply them.

Shall we leave that up to you?

You should have for all supported $k$ that $\mathsf E(I(k)^2)=\mathsf E(I(k)) = 1-(\tfrac 56)^N$, and for all supported $j\neq k$ that $\mathsf E(I(j)\cdot I(k))=1-2(\tfrac 56)^N+(\tfrac 46)^N$.

Bumbble Comm On 16 Aug 2022 - 11:46

I just wanted to give another way to answer the question. Instead of partitioning by whether the number $i$, $1 \leq i \leq 6$, has appeared, one can consider $X_i$ to be the indicator random variable corresponding to whether there is an extra distinct unique value because of roll $i$. Hence, $D=X_1 + ... + X_n$.

We have $E(D)=\sum E(X_i)$ and $E(X_i)=\left(\frac{5}{6}\right)^{i-1}$ since there is an extra distinct value if and only if all the previous values are not the same value as roll $i$. Hence $E(D)=\sum_{i=1}^n \left(\frac{5}{6}\right)^{i-1} = \frac{1-\left(\frac{5}{6}\right)^n}{\frac{1}{6}}$.

As for the variance/standard deviation:

$\begin{align} var(D)&=E((X_1+...+X_n)^2)-E(D)^2\\ &=E(X_1^2+...+X_n^2) + 2\sum_{i<j}E(X_iX_j)-E(D)^2 \\ &=E(D)+2\sum_{i=1}^n\sum_{k=1}^{n-i}E(X_iX_{i+k})-E(D)^2 \end{align} $

Hence this boils down to finding $E(X_iX_{i+k})$. Let $A$ be the event that value of roll $i$ not equal value of roll $i+k$}. Since $$E(X_iX_{i+k} | A)=\left(\frac{4}{6}\right)^{i-1} \left(\frac{5}{6}\right)^{k-1}$$ and

$ \begin{align} E(X_iX_{i+k}) &= E(X_iX_{i+k} | A)P(A) + E(X_iX_{i+k} | A^c)P(A^c)\\ &= \left(\frac{4}{6}\right)^{i-1} \left(\frac{5}{6}\right)^{k-1} \left(\frac{5}{6}\right) + 0 \end{align} $

, the answer follows.

**Bumbble Comm** · Accepted Answer

We approach this problem from a combinatorial perspective. The number of $n$-roll sequences with $k$ distinct values ($1\le k\le6$), out of $6^n$ sequences total, is $$D(n,k)=\binom6kk!\left\{n\atop k\right\}=\binom6k\sum_{j=0}^k(-1)^{k-j}\binom kjj^n$$ where $\left\{n\atop k\right\}$ is the Stirling number of the second kind and counts the number of ways to partition the rolls into homogeneous subsets, and $\binom6kk!$ is the number of ways to fill those subsets with dice rolls. Letting $n$ vary across the positive integers we get $$D(n,1)=6$$ $$D(n,2)=15\cdot2^n-30$$ $$D(n,3)=-60\cdot2^n+20\cdot3^n+60$$ $$D(n,4)=90\cdot2^n-60\cdot3^n+15\cdot4^n-60$$ $$D(n,5)=-60\cdot2^n+60\cdot3^n-30\cdot4^n+6\cdot5^n+30$$ $$D(n,6)=15\cdot2^n-20\cdot3^n+15\cdot4^n-6\cdot5^n+6^n-6$$ $\frac{D(n,k)}{6^n}$ then gives the probability an $n$-roll sequence will have $k$ distinct values. The expected value of $D$ for a given $n$ is then $$\mu_n=\sum_{k=1}^6k\cdot\frac{D(n,k)}{6^n}=6\left(1-\left(\frac56\right)^n\right)$$ and the standard deviation is $$\sigma_n=\sqrt{\sum_{k=1}^6\frac{D(n,k)}{6^n}(k-\mu_n)^2}=\sqrt{\frac{5\cdot144^n-6\cdot150^n+180^n}{6^{3n-1}}}$$ Note that $$\lim_{n\to\infty}\mu_n=6\text{ and }\lim_{n\to\infty}\sigma_n=0$$ which match our intuition, since for large $n$ almost all roll sequences should contain all six outcomes.

The SymPy code that generated these results can be found here.

Expected value of the number of distinct results of die rolls in $N$ trials

There are 3 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in DICE

Related Questions in VARIANCE

Trending Questions

Popular # Hahtags

Popular Questions