Frequency of the "most frequent" outcome when rolling a die $n$ times

85 Views Asked by At

Suppose I have a traditional fair die (six faces) and suppose that I roll it $n$ times.

Intuitively, the frequency of a given outcome should tend towards $\frac{1}{6}$ as $n$ tends to infinity.

However, the frequency of the most frequent outcome (so to speak) is very likely to be more than $\frac{1}{6}$. Indeed, in order for the frequency of the most frequent outcome to be $\frac{1}{6}$, we would need to roll a 1 exactly $\frac{n}{6}$ times, 2 exactly $\frac{n}{6}$ times, and so on.

  1. What can be said about the frequency of the most frequent outcome (as a function of $n$)?

  2. Does it tend to some specific value as $n$ tends to infinity?

1

There are 1 best solutions below

0
On BEST ANSWER

Let's model the outcome of each die as a vector $r_i \in \mathbb{N}^6$

Let the count of each outcome after $n$ rolls be $R_n:= \sum_{i=1}^n r_i$. In this case, $R_n$ will have a multinomial distribution.

From the central limit theorem, just as the binomial approaches the normal distribution, the multinomial approaches a multivariate gaussian as $n\to \infty$:

$$\sqrt{n}\left[\frac1n R_n - \frac16\right] \xrightarrow{d} N(0,\Sigma)$$

As shown in the Wikipedia article I linked to

$$\Sigma_{ii} = p_i(1-p_i)=\frac{5}{36}, \\ \Sigma_{ij}=-p_i p_j=-\frac{1}{36}\;\;i\neq j$$

Therefore, you can make asymptotic statements about the frequency of rolls based on the above reasoning.

A couple things I notice that can be said about the behavior of the maximum after $n$ rolls:

The distribution of the largest component of $R_n$ is called the sup norm of $R_n$: $||R_n||_{\infty}$.

Let $||Q||_{\infty}$ be the sup norm of a draw from $N(0,\Sigma)$ then

$$\sqrt{n}\left[\frac1n ||R_n||_{\infty} - \frac16\right] \xrightarrow{d} ||Q||_{\infty}$$

So you can infer large-sample behavior of the most frequent roll from analyzing the distribution of $||Q||_{\infty}$. I did a quick internet search and it seems like a challenging thing to analyze except for iid case.

As for the long-run behavior, we can see from the above asymptotic results that our scaled $R_n$ approaches the mean (in probability):

$$\sqrt{n}\left[\frac1n R_n - \frac16\right] \xrightarrow{d} N(0,\Sigma) \implies \left[\frac1n R_n - \frac16\right] \stackrel{\cdot}{\sim} N\left(0,\frac{1}{\sqrt{n}}\Sigma\right) \xrightarrow{n\to \infty}N\left(0,\mathbf{0}_{6,6}\right)$$ $$\implies \frac1n R_n - \frac16\ \xrightarrow{p} \mathbf{0} \implies \frac1n R_n \xrightarrow{p} \mathbf{\frac16}$$

So, as mentioned in the comments, the most frequent number seen so far will still approach the overall average (in probability).

For the distribution of the most frequent, it's a bit more work to get anything beyond theoretical asymptotic results. Hopefully the above will give you some helpful leads.