Statistical Testing of a Biased Coin

2.4k Views Asked by At

Somebody comes up to you and says that the quarter he has in his hand is unfair. How do we know if he's telling the truth?

First of all, what are the possible hypotheses? The coin could be completely fixed ($p = 1$) and only land on heads (maybe both sides are the same). The coin could be completely fair ($p = 0.5$) and land on both heads and tails with equal frequency. Or the coin could be between these two extremes ($p \in (0.5, 1)$) and have a varying degree of bias.

So we design an experiment. Let $C$ be the random variable representing the number of times the coin lands on the most frequent side (wlog). $C \sim Binomial(n, p)$. We flip the suspected coin $n$ times and observe that it comes up heads $k$ times.

What is $P[p = x | C = k]$? We know that by Bayes theorem that

$P[p = x | C = k] = \frac{P[C = k | p = x] P[p = x]}{P[C = k]}$

However, here is the problem: $p$ needs to be assigned a continuous probability distribution over the support of $(0.5, 1)$, but that entails that $P[p = x] = 0$ for any value of $x$. How do I overcome this?

1

There are 1 best solutions below

4
On BEST ANSWER

A more conventional, and perhaps more easily digested, Bayesian formulation of this problem would be to begin with a prior Beta distribution on the Heads probability $\theta$.

If you have no prior information or prejudice about the bias of the coin, maybe pick $\theta \sim Beta(1, 1) \equiv Unif(0, 1)$ or the so-called Jeffrey's prior $\theta \sim Beta(.5, .5),$ a 'bathtub shaped' distribution. If you suspect the coin is "pretty nearly" fair, perhaps the prior distribution would be $Beta(100, 100),$ which (according to a simple computation in R) implies you think $P(.43 < \theta < .57) \approx .95$.

 diff(pbeta(c(.43, .57), 100, 100))
 ## 0.9531024

We say that the prior density function is $p(\theta) \propto \theta^{100 - 1}(1-\theta)^{100-1},$ where the proportionality symbol $\propto$ recognizes that we have omitted the 'constant of integration'.

Then suppose you toss the coin 1000 times and get Heads 563 times. This means that your binomial likelihood function is $p(x|\theta) \propto \theta^{563}(1-\theta)^{437}.$

The relevant version of Bayes' Theorem for such continuous distributions, used in Bayesian inference, is: $$\text{POSTERIOR} \propto \text{PRIOR} \times \text{LIKELIHOOD}.$$

So we have the posterior 'kernel' $$p(\theta|x) \propto p(\theta)p(x|\theta) = \theta^{100 - 1}(1-\theta)^{100-1} \times \theta^{563}(1-\theta)^{437} = \theta^{663 - 1}(1-\theta)^{537-1},$$ which we recognize as the kernel of $Beta(653,537).$

Finally, using R, we say that a Bayesian 'credible' (or 'probability') interval for $\theta$ is $(.520, .577).$ Thus, presumably we would be convinced that the coin is at least slightly unfair.

 qbeta(c(.025,.975), 653, 537)
 ##  0.5204059 0.5769179

Through Bayes' Theorem we have melded together information in the prior distribution with data from an experiment to obtain a Bayesian interval estimate. Moreover, if we later obtain additional data on the same coin, this posterior distribution can become our new prior and we can incorporate the new information to obtain a new posterior distribution and a refined interval estimate.

This is about the simplest possible example of Bayesian inference. It is natural to use a beta prior on $\theta$ because the support of beta distributions is $(0,1).$ Moreover, it is easy to find the posterior distribution because the beta prior is 'conjugate to' (mathematically compatible with) the binomial likelihood, producing an easily recognized beta posterior.

I chose to give an example of Bayesian interval estimation, rather than hypothesis testing, because it is a little more difficult to explain Bayesian hypothesis testing.

Note: If you had begun with the 'noninformative' uniform prior, and gotten 563 Heads in 1000 tosses, a 95% Bayesian posterior probability interval would have been $(.532, .593).$

 qbeta(c(.025,.975), 564, 438)
 ## 0.5320643 0.5934465

This is $numerically$ similar to the commonly-used frequentist 95% CI based on 'appending two successes and two failures': $(.532,.593).$

 pm = c(-1,1);  th.hat = 565/1004
 th.hat + pm*1.96*sqrt(th.hat*(1-th.hat)/1004)
 ## 0.532065 0.593433

However, Bayesian and frequentist interpretations differ. The Bayesian interpretation is that the interval has $probability$ 95% of being a true statement about the (random) value $\theta.$ By contrast, the frequentist interpretation would have to do with the long-run frequency 95% with which intervals made by a similar process include the unknown, but fixed parameter value $\theta.$