In a recent post "Fair die or not from 3-D printer"on this site @Eumel reported making a die on a 3-D printer, providing data on the faces seen in 150 rolls, and wondered about "the chances that the die is fair."
Several comments raised philosophical issues about this question, and @BrianTung explicitly raised the issue of taking a Bayeaian approach. I showed results of a chi-squared goodness-of-fit test, and power of the test against one specific alternative to fairness. In particular, the data showed that face 1 appeared 21 times in 150 rolls.
The main purpose of this post is to show a Bayesian probability interval for the probability $\theta$ that the die shows face 1, based on the prior distribution $\theta \sim Beta(12, 60).$ This particular prior is chosen because it has $E(\theta) = 1/6$ and has $P(0.1 < \theta < 0.25) \approx 0.95.$ I chose this particular prior because it seemed to me that these might correspond to the prior opinion of a reasonable person upon a cursory inspection of the die. (If not, another prior is easily substituted.)
Traditional frequentist statistical analysis gives no direct probability information about parameters of probability distributions. In particular, it will not tell you the probability a die is fair. Essentially frequentist analysis makes statements about the data.
Frequentist Confidence Interval. For example, knowing that face 1 appeared 21 times in 150 rolls, you can get a frequentist 95% confidence interval for the probability $\theta$ of seeing face 1 on any one roll: $\hat \theta \pm 1.96\sqrt{\hat \theta(1 - \hat \theta)/150},$ where $\hat \theta = 21/150$. This computes to $(0.084, 0.196).$ Such an interval can be interpreted to mean that the process used gives an interval that covers the true value of $\theta$ in 95% of 150-roll experiments. But $\theta$ itself is an unknown fixed value, which is either in this interval or not. For the data at hand, the confidence interval includes the value $\theta = 1/6,$ and you might say something like "the data are consistent with face 1 showing 1/6th of the time."
Bayesian approach. By contrast, the Bayesian approach considers $\theta$ to be a random variable. One begins with a prior distribution of $\theta,$ based on experience with or personal opinion about the situation at hand. Here, we use the prior distribution is $f(\theta) \propto \theta^{\alpha_0 - 1}(1 - \theta)^{\beta_0 -1},$ where $\alpha_0 = 12,$ and $\beta_0 = 60.$ Also, the likelihood function is $f(x|\theta) \propto \theta^{21}(1-\theta)^{129}.$ According to the general version of Bayes' Theorem $$\text{POSTERIOR}\propto \text{PRIOR}\times\text{LIKELIHOOD}.$$ The proportionality symbol $\propto$ indicates that we have omitted the constant that makes each distribution integrate or sum to unity. Accordingly, the posterior distribution is found as $$f(\theta|x) \propto f(\theta)f(x|\theta) = \theta^{12 - 1}(1 - \theta)^{60 -1} \times \theta^{21}(1-\theta)^{129} \propto \theta^{33-1}(1-\theta)^{189-2}.$$ Thus, we recognize the kernel of the posterior distribution to match that of BETA(33, 189).
The posterior distribution is a melding of the information in the prior distribution and in the likelihood function of the data. Cutting 2.5% from each tail of the posterior distribution, we obtain the 95% Bayesian posterior probability interval $(0.105, 0.198),$ which includes 1/6.
If we believe that the prior is reasonable and that the data are reliable, then we can say there is 95% probability that $\theta$ lies in this interval. This statement provides a direct probability statement about the random variable $\theta.$
Also, the mode, median, and mean. of the posterior distribution are 0.145, 0.158, and 0.149, respectively; any of these might be used as a Bayesian point estimate of $\theta.$
The left panel of the figure below shows the prior density function, its mean, and values cutting 2.5% from each tail. The right panel shows the posterior distribution and posterior probability interval.
Notes: (1) You may wonder why the prior distribution was chosen from the beta family. Reasons: First, because the support of beta distributions is $(0,1),$ which seems natural when modeling a probability. Second, for convenience; the beta prior and the binomial likelihood have compatible mathematical forms, which makes it easy to deduce the posterior beta distribution without tedious computation. (Intuitively, we might have chosen prior $Norm(.1667, 0.044)$. Its density is shown as a faint dotted curve in the figure above. But its support is the entire real line. Also, computation of the posterior would have been messy. Because of their similar mathematical form, we say that the beta prior and the binomial likelihood are 'conjugate'.)
(2) It is seldom true in applications that there is no basis at all for selecting a prior distribution. Here, we can inspect the die to see that it has six roughly square faces, and no prominent edges to interfere with honest rolling. If we really have no prior information, we might choose a 'flat' or 'non-informative' prior. The naive possibility would be $Unif(0,1) = Beta(1,1),$ but some theoretical considerations might point to $Beta(.5, .5)$ or other possibilities. Either way, the influence of the prior on the posterior would be greatly reduced. Very roughly speaking, one might say that our prior $Beta(12, 60)$ is equivalent to seeing 72 rolls of the die, in which face 1 showed 12 times.
(3) Here is a short program in R to find the parameters $\alpha_0$ and $\beta_0$ of the prior distribution from simple features of the distribution (mean and spread):
(4) The original post referenced at the start shows counts 21, 30, 23, 31, 21, 34 out of 150 rolls for faces 1 through 6, respectively. For this introductory Bayesian analysis, we have looked only at face 1. Searching the Internet for
Bayesian multinomial distributionfetches a number of scholarly articles and course notes on the general topic of Bayesian analysis of categorical data, a topic especially well-suited to the analysis of election polling data.