How do you check if a die is fair using the posterior probability density function?

Question

How do you check if a die is fair using the posterior probability density function?

223 Views Asked by Bumbble Comm At 04 Apr 2026 - 11:56

I want to understand how to check if one die is fair using the posterior probability density function, similarly to how a coin is checked in this manner here.

To do this, I am assuming that the die has a multinomial distribution, giving the probability density function as: $$P=\frac{n!}{(n_1!)(n_2!)\cdots(n_x!)}p_1^{n_1}p_2^{n_2}\cdots p_x^{n_x}$$ where $n$ is the number of events, $n_x$ is the number of outcomes of event $x$ and $p_x$ is the probability of event $x$ happening.

For the sake of the example, let's assume that we have a 3-sided die with letters $A$, $B$ and $C$ respectively on each face and we want to check whether it is fair. Let's say that we have thrown the die 9 times and the outcomes were the following:

Face   Count
A      2
B      4
C      3

Now, following this Wikipedia article we would need to create something along the lines of: $$f(a\mid A=2,B=4,C=3)=\frac{9!}{(2!)(4!)(3!)}a^2b^4c^3$$ where $a$, $b$ and $c$ are the probabilities of the faces $A$, $B$ and $C$ showing up respectively. When there are only two cases, this can be solved since the parameters $b$ and $c$ would be represented by $(1-a)$. However, in this case I am stuck and do not know how to proceed.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Setup:

We have a die with $S$ sides.
The probability of each side showing after rolling the die is $p_s$ for $s=1,2,\ldots,S$ with $\sum_{s=1}^S p_s = 1$.
We rolled the die and observed each side $n_s$ times for a total of $\sum_{s=1}^S n_s$ die rolls.

We want to estimate the probability that the die is fair. To do this, we need to define what "fair" means. Practically, it doesn't work to require $p_s=1/S$ for all $s$ since the probability of each face having a probability of precisely $1/S$ of showing up (so, with infinite precision) is 0.

To get around this, we pick an acceptable precision, $\epsilon$, and call the die "fair" if $|p_s - 1/S| < \epsilon$ for all $s$. We compute the probability of the die being fair as $$P(|p_s - 1/S| < \epsilon \ \ \forall s).$$

To compute this probability, we still need the posterior distribution over the probability vector $\vec{p} = (p_1, p_2, \ldots, p_S)$ given the observed number of times each face showed, $\vec{n} = (n_1, n_2, \ldots, n_S)$.

It seems reasonable to assume a uniform prior over $\vec{p}$. $$\vec{p} \sim \text{Dirichlet}(\vec{p}\;|\;\vec{\alpha} = 1) = 1 \quad\text{with}\quad\sum_{s=1}^S p_s=1.$$ However, if we have background information about the die, we can encode that in the prior. For example, we could increase all the components of $\vec{\alpha}$ equally if we're fairly certain the die is fair.

It is convenient to write the uniform prior as a Dirichlet distribution since the posterior will also be Dirichlet distributed when the likelihood function is a Multinomial. The Dirichlet distribution is a conjugate prior for the Multinomial likelihood.

The likelihood function is $$P(\vec{n}\;|\;\vec{p}) = \text{Multinomial}(\vec{n}\;|\;\vec{p}) = \frac{n!}{n_1!n_2!\ldots n_S!}p_1^{n_1}p_2^{n_2}\ldots p_S^{n_S}.$$

The posterior distribution is $$P(\vec{p}\;|\;\vec{n}) = \text{Dirichlet}(\vec{p}\;|\;\vec{\alpha}=\vec{1} + \vec{n}).$$

It is with respect to this posterior distribution that we want to compute $$P(|p_s - 1/S| < \epsilon \ \ \forall s)$$ to find whether the die is fair.

2-sided die

A 2-sided die is just a coin. In this case, we can compute the probability of being fair analytically.

Given observations $(n_1, n_2)$ and $\epsilon < \frac{1}{2}$ (we constrain the maximum threshold for considering a die fair to $\frac{1}{S}$ in general),

$$ \begin{align} & P(|p_s - 1/2| < \epsilon \ \ \forall s\in\{1,2\}) \\ =\;& \frac{1}{\text{B}(n_1+1, n_2+1)} \int_{\frac{1}{2}-\epsilon}^{\frac{1}{2}+\epsilon} \int_{\frac{1}{2}-\epsilon}^{\frac{1}{2}+\epsilon} p_1^{n_1}\,p_2^{n_2}\,\delta_{1 - p_1 - p_2} \;\text{d}p_2\;\text{d}p_1 \\ =\;& \frac{1}{\text{B}(n_1+1, n_2+1)} \int_{\frac{1}{2}-\epsilon}^{\frac{1}{2}+\epsilon} p_1^{n_1} (1-p_1)^{n_2} \;\text{d}p_1 \\ =\;& I_{\frac{1}{2}+\epsilon}(n_1+1, n_2+1) - I_{\frac{1}{2}-\epsilon}(n_1+1, n_2+1) \end{align} $$

Here,

$\text{B}(a,b)$ is the Beta function.
$\delta_{1 - p_1 - p_2}$ is the Kronecker delta function which is 1 when $1 - p_1 - p_2$ and 0 otherwise. This is used to ensure that the probabilities of the faces of the die sum to 1 in the integral.
$I_x(a,b)$ is the regularized incomplete beta function.

3-sided die

For a 3-sided die, we can still compute the probability of being fair analytically but things get more complicated than the 2-sided case. Computing the probability using a numeric integral is still straightforward though.

Given observations $(n_1, n_2, n_3)$ and $\epsilon < \frac{1}{3}$,

$$ \begin{align} & P(|p_s - 1/3| < \epsilon \ \ \forall s\in\{1,2,3\}) \\ =\;& \frac{1}{\text{B}(n_1+1, n_2+1, n_3+1)} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} p_1^{n_1}\,p_2^{n_2}\,p_3^{n_3}\,\delta_{1 - p_1 - p_2 - p_3} \;\text{d}p_3\;\text{d}p_2\;\text{d}p_1 \\ \end{align} $$

where $\text{B}()$ with 3 arguments is the multivariable generalization of the Beta function.

Here, we have to be careful when substituting $p_3 = 1 - p_1 - p_2$ and removing the inner integral since $\frac{1}{3} - \epsilon \le p_3 \le \frac{1}{3} + \epsilon$ places further constraints on the allowable values of $p_2$, affecting the limits of integration on $p_2$.

$$ \begin{align} =\;& \frac{1}{B(n_1+1, n_2+1, n_3+1)} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} p_1^{n_1} \int_{\max(\frac{1}{3}-\epsilon, \frac{2}{3}-\epsilon-p_1)}^{\min(\frac{1}{3}+\epsilon, \frac{2}{3}+\epsilon-p_1)} p_2^{n_2} (1-p_1-p_2)^{n_3} \;\text{d}p_2 \;\text{d}p_1 \\ \end{align} $$

To solve this integral, we split the integral into 2 parts to get rid of the max and min functions in the limits of integration. After applying some properties of the incomplete beta function, the integral simplifies to

$$ \begin{align} =\;& \sum_{j=n_2+1}^{n_2+n_3+1}\Bigg( \\ &\qquad \frac{1}{j\,\text{B}(j, n_1+n_2+n_3+3-j)} \Bigg(\\ &\qquad\qquad + (1/3+\epsilon)^j (2/3 - \epsilon)^{n_1+n_2+n_3+2-j} \left( I_{\frac{1}{2 - 3\epsilon}}(n_1+1, n_2+n_3+2-j) - I_{\frac{1 - 3\epsilon}{2 - 3\epsilon}}(n_1+1, n_2+n_3+2-j) \right) \\ &\qquad\qquad - (1/3-\epsilon)^j (2/3 + \epsilon)^{n_1+n_2+n_3+2-j} \left( I_{\frac{1 + 3\epsilon}{2 + 3\epsilon}}(n_1+1, n_2+n_3+2-j) - I_{\frac{1}{2 + 3\epsilon}}(n_1+1, n_2+n_3+2-j) \right) \Bigg) \ +\\ &\qquad \frac{1}{(n_1+1+j)\,\text{B}(n_1+1+j, n_2+n_3+2-j)} \Bigg(\\ &\qquad\qquad - (1/3+\epsilon)^{n_2+n_3+1-j} (2/3-\epsilon)^{n_1+j+1} \left( I_{\frac{1}{2-3\epsilon}}(n_1+1, j+1) - I_{\frac{1-3\epsilon}{2-3\epsilon}}(n_1+1, j+1) \right) \\ &\qquad\qquad + (1/3-\epsilon)^{n_2+n_3+1-j} (2/3 + \epsilon)^{n_1+j+1} \left( I_{\frac{1 + 3\epsilon}{2 + 3\epsilon}}(n_1+1, j+1) - I_{\frac{1}{2 + 3\epsilon}}(n_1+1, j+1) \right) \Bigg) \\ \end{align} $$

It might be possible to simplify this further but this is already an analytical solution with no integrals remaining.

Estimating the probability by sampling

For the general case, a die with $S$ sides, I don't have an analytical solution and computing the numeric integral becomes intractable due to the curse of dimensionality.

However, we can still estimate the probability through Monte Carlo integration by generating samples from the Dirichlet posterior and counting which fraction of them fall within the range $1/S\pm\epsilon$.

In the code below, we estimate the probability of the 3-sided die being fair by generating 1,000,000 dice from the posterior distribution over dice and counting how many of them are fair. It is straightforward to generalize this code to a die with more sides. We use $\epsilon = 1/30$ in this example.

import numpy as np
import scipy.stats as sts
 
n = np.array([2, 4, 3])
S = len(n)
epsilon = 0.1 / S
posterior = sts.dirichlet(n + np.ones(S))

samples = posterior.rvs(size=1000000)
samples_within_epsilon = np.all(np.abs(samples - 1/S) < epsilon, axis=1)
print('P(fair) ≈', np.mean(samples_within_epsilon))

This prints out P(fair) ≈ 0.022945. Because so few counts are observed in the data, many unfair dice are consistent with the observations and the probability of being fair is low.

With more data, we would become more certain. Setting n = np.array([332, 334, 333]) yields the output P(fair) ≈ 0.934626. Note that even with 999 rolls of the die, there is still a substantial probability (about 7.5%) that the die is unfair.

To investigate how the posterior probability of being fair scales with the number of die rolls in the data, let's assume all faces were observed with the same frequency and let the total number of die rolls range from 3 to 3000.

all_N = np.arange(0, 3001, 30)[1:]
p_fair = []
for N in all_N:
    n = np.array([N // 3] * 3)
    S = len(n)
    epsilon = 0.1 / S
    posterior = sts.dirichlet(n + np.ones(S))

    samples = posterior.rvs(size=1000000)
    samples_within_epsilon = np.all(np.abs(samples - 1/S) < epsilon, axis=1)
    p_fair.append(np.mean(samples_within_epsilon))

import matplotlib.pyplot as plt
plt.figure(figsize=(8, 6))
plt.title('How P(fair) varies with the total number of die rolls observed')
plt.xlabel('Observed die rolls')
plt.ylabel('P(fair)')
plt.plot(total_n, p_fair)
plt.ylim(0, 1)
plt.show()

We can also investigate how the posterior probability of being fair scales with $\epsilon$, keeping the number of die rolls observed fixed at 999.

n = np.array([999 // 3] * 3)
S = len(n)
posterior = sts.dirichlet(n + np.ones(S))
samples = posterior.rvs(size=1000000)

all_epsilon = np.linspace(0.01 / S, 0.1 / S, 100)
p_fair = []
for epsilon in all_epsilon:
    samples_within_epsilon = np.all(np.abs(samples - 1/S) < epsilon, axis=1)
    p_fair.append(np.mean(samples_within_epsilon))

import matplotlib.pyplot as plt
plt.figure(figsize=(8, 6))
plt.title(r'How P(fair) varies with $\epsilon$')
plt.xlabel(r'$\epsilon$')
plt.ylabel('P(fair)')
plt.plot(all_epsilon, p_fair)
plt.ylim(0, 1)
plt.show()

Comparing the plots above to the analytic result shows that they match perfectly.

How do you check if a die is fair using the posterior probability density function?

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in BAYESIAN

Trending Questions

Popular # Hahtags

Popular Questions