I want to understand how to check if one die is fair using the posterior probability density function, similarly to how a coin is checked in this manner here.
To do this, I am assuming that the die has a multinomial distribution, giving the probability density function as: $$P=\frac{n!}{(n_1!)(n_2!)\cdots(n_x!)}p_1^{n_1}p_2^{n_2}\cdots p_x^{n_x}$$ where $n$ is the number of events, $n_x$ is the number of outcomes of event $x$ and $p_x$ is the probability of event $x$ happening.
For the sake of the example, let's assume that we have a 3-sided die with letters $A$, $B$ and $C$ respectively on each face and we want to check whether it is fair. Let's say that we have thrown the die 9 times and the outcomes were the following:
Face Count
A 2
B 4
C 3
Now, following this Wikipedia article we would need to create something along the lines of: $$f(a\mid A=2,B=4,C=3)=\frac{9!}{(2!)(4!)(3!)}a^2b^4c^3$$ where $a$, $b$ and $c$ are the probabilities of the faces $A$, $B$ and $C$ showing up respectively. When there are only two cases, this can be solved since the parameters $b$ and $c$ would be represented by $(1-a)$. However, in this case I am stuck and do not know how to proceed.
Setup:
We want to estimate the probability that the die is fair. To do this, we need to define what "fair" means. Practically, it doesn't work to require $p_s=1/S$ for all $s$ since the probability of each face having a probability of precisely $1/S$ of showing up (so, with infinite precision) is 0.
To get around this, we pick an acceptable precision, $\epsilon$, and call the die "fair" if $|p_s - 1/S| < \epsilon$ for all $s$. We compute the probability of the die being fair as $$P(|p_s - 1/S| < \epsilon \ \ \forall s).$$
To compute this probability, we still need the posterior distribution over the probability vector $\vec{p} = (p_1, p_2, \ldots, p_S)$ given the observed number of times each face showed, $\vec{n} = (n_1, n_2, \ldots, n_S)$.
It seems reasonable to assume a uniform prior over $\vec{p}$. $$\vec{p} \sim \text{Dirichlet}(\vec{p}\;|\;\vec{\alpha} = 1) = 1 \quad\text{with}\quad\sum_{s=1}^S p_s=1.$$ However, if we have background information about the die, we can encode that in the prior. For example, we could increase all the components of $\vec{\alpha}$ equally if we're fairly certain the die is fair.
It is convenient to write the uniform prior as a Dirichlet distribution since the posterior will also be Dirichlet distributed when the likelihood function is a Multinomial. The Dirichlet distribution is a conjugate prior for the Multinomial likelihood.
The likelihood function is $$P(\vec{n}\;|\;\vec{p}) = \text{Multinomial}(\vec{n}\;|\;\vec{p}) = \frac{n!}{n_1!n_2!\ldots n_S!}p_1^{n_1}p_2^{n_2}\ldots p_S^{n_S}.$$
The posterior distribution is $$P(\vec{p}\;|\;\vec{n}) = \text{Dirichlet}(\vec{p}\;|\;\vec{\alpha}=\vec{1} + \vec{n}).$$
It is with respect to this posterior distribution that we want to compute $$P(|p_s - 1/S| < \epsilon \ \ \forall s)$$ to find whether the die is fair.
2-sided die
A 2-sided die is just a coin. In this case, we can compute the probability of being fair analytically.
Given observations $(n_1, n_2)$ and $\epsilon < \frac{1}{2}$ (we constrain the maximum threshold for considering a die fair to $\frac{1}{S}$ in general),
$$ \begin{align} & P(|p_s - 1/2| < \epsilon \ \ \forall s\in\{1,2\}) \\ =\;& \frac{1}{\text{B}(n_1+1, n_2+1)} \int_{\frac{1}{2}-\epsilon}^{\frac{1}{2}+\epsilon} \int_{\frac{1}{2}-\epsilon}^{\frac{1}{2}+\epsilon} p_1^{n_1}\,p_2^{n_2}\,\delta_{1 - p_1 - p_2} \;\text{d}p_2\;\text{d}p_1 \\ =\;& \frac{1}{\text{B}(n_1+1, n_2+1)} \int_{\frac{1}{2}-\epsilon}^{\frac{1}{2}+\epsilon} p_1^{n_1} (1-p_1)^{n_2} \;\text{d}p_1 \\ =\;& I_{\frac{1}{2}+\epsilon}(n_1+1, n_2+1) - I_{\frac{1}{2}-\epsilon}(n_1+1, n_2+1) \end{align} $$
Here,
3-sided die
For a 3-sided die, we can still compute the probability of being fair analytically but things get more complicated than the 2-sided case. Computing the probability using a numeric integral is still straightforward though.
Given observations $(n_1, n_2, n_3)$ and $\epsilon < \frac{1}{3}$,
$$ \begin{align} & P(|p_s - 1/3| < \epsilon \ \ \forall s\in\{1,2,3\}) \\ =\;& \frac{1}{\text{B}(n_1+1, n_2+1, n_3+1)} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} p_1^{n_1}\,p_2^{n_2}\,p_3^{n_3}\,\delta_{1 - p_1 - p_2 - p_3} \;\text{d}p_3\;\text{d}p_2\;\text{d}p_1 \\ \end{align} $$
where $\text{B}()$ with 3 arguments is the multivariable generalization of the Beta function.
Here, we have to be careful when substituting $p_3 = 1 - p_1 - p_2$ and removing the inner integral since $\frac{1}{3} - \epsilon \le p_3 \le \frac{1}{3} + \epsilon$ places further constraints on the allowable values of $p_2$, affecting the limits of integration on $p_2$.
$$ \begin{align} =\;& \frac{1}{B(n_1+1, n_2+1, n_3+1)} \int_{\frac{1}{3}-\epsilon}^{\frac{1}{3}+\epsilon} p_1^{n_1} \int_{\max(\frac{1}{3}-\epsilon, \frac{2}{3}-\epsilon-p_1)}^{\min(\frac{1}{3}+\epsilon, \frac{2}{3}+\epsilon-p_1)} p_2^{n_2} (1-p_1-p_2)^{n_3} \;\text{d}p_2 \;\text{d}p_1 \\ \end{align} $$
To solve this integral, we split the integral into 2 parts to get rid of the max and min functions in the limits of integration. After applying some properties of the incomplete beta function, the integral simplifies to
$$ \begin{align} =\;& \sum_{j=n_2+1}^{n_2+n_3+1}\Bigg( \\ &\qquad \frac{1}{j\,\text{B}(j, n_1+n_2+n_3+3-j)} \Bigg(\\ &\qquad\qquad + (1/3+\epsilon)^j (2/3 - \epsilon)^{n_1+n_2+n_3+2-j} \left( I_{\frac{1}{2 - 3\epsilon}}(n_1+1, n_2+n_3+2-j) - I_{\frac{1 - 3\epsilon}{2 - 3\epsilon}}(n_1+1, n_2+n_3+2-j) \right) \\ &\qquad\qquad - (1/3-\epsilon)^j (2/3 + \epsilon)^{n_1+n_2+n_3+2-j} \left( I_{\frac{1 + 3\epsilon}{2 + 3\epsilon}}(n_1+1, n_2+n_3+2-j) - I_{\frac{1}{2 + 3\epsilon}}(n_1+1, n_2+n_3+2-j) \right) \Bigg) \ +\\ &\qquad \frac{1}{(n_1+1+j)\,\text{B}(n_1+1+j, n_2+n_3+2-j)} \Bigg(\\ &\qquad\qquad - (1/3+\epsilon)^{n_2+n_3+1-j} (2/3-\epsilon)^{n_1+j+1} \left( I_{\frac{1}{2-3\epsilon}}(n_1+1, j+1) - I_{\frac{1-3\epsilon}{2-3\epsilon}}(n_1+1, j+1) \right) \\ &\qquad\qquad + (1/3-\epsilon)^{n_2+n_3+1-j} (2/3 + \epsilon)^{n_1+j+1} \left( I_{\frac{1 + 3\epsilon}{2 + 3\epsilon}}(n_1+1, j+1) - I_{\frac{1}{2 + 3\epsilon}}(n_1+1, j+1) \right) \Bigg) \\ \end{align} $$
It might be possible to simplify this further but this is already an analytical solution with no integrals remaining.
Estimating the probability by sampling
For the general case, a die with $S$ sides, I don't have an analytical solution and computing the numeric integral becomes intractable due to the curse of dimensionality.
However, we can still estimate the probability through Monte Carlo integration by generating samples from the Dirichlet posterior and counting which fraction of them fall within the range $1/S\pm\epsilon$.
In the code below, we estimate the probability of the 3-sided die being fair by generating 1,000,000 dice from the posterior distribution over dice and counting how many of them are fair. It is straightforward to generalize this code to a die with more sides. We use $\epsilon = 1/30$ in this example.
This prints out
P(fair) ≈ 0.022945. Because so few counts are observed in the data, many unfair dice are consistent with the observations and the probability of being fair is low.With more data, we would become more certain. Setting
n = np.array([332, 334, 333])yields the outputP(fair) ≈ 0.934626. Note that even with 999 rolls of the die, there is still a substantial probability (about 7.5%) that the die is unfair.To investigate how the posterior probability of being fair scales with the number of die rolls in the data, let's assume all faces were observed with the same frequency and let the total number of die rolls range from 3 to 3000.
We can also investigate how the posterior probability of being fair scales with $\epsilon$, keeping the number of die rolls observed fixed at 999.
Comparing the plots above to the analytic result shows that they match perfectly.