Best estimate of biased coin

98 Views Asked by At

I came across this question that had a solution, and I wanted to discuss how they got their solution, and whether it is better than a MLE solution and why that might be.

There is a loaded coin that flips tails with probability $p$, where $p$ is known only to be distributed uniformly on the interval $[0,1]$. The coin is flipped five times, each time turning up tails. What is the best estimate of $p$?

There is an MLE solution to this that says it is $\frac{5}{5} = 1$. However, is this solution considering that $p$ "is known only to be distributed uniformly on the interval $[0,1]$?

The solution provided takes a Bayesian approach. To be honest, I don't understand it, and if anyone could explain that would be really helpful. $$ \begin{split} E[p | T^5] & = \int_0^1 P(p|T^5]p \text{d}p \\ & = \int_0^1 \frac{P(p|T^5)P(p) \text{d}p}{\int_0^1 P(p|T^5)P(p) \text{d}p} p \text{d}p = \frac{\int_0^1 p^6\text{d}p}{\int_0^1 p^5\text{d}p} = \frac{6}{7} \end{split} $$

Thank you in advance!

1

There are 1 best solutions below

6
On BEST ANSWER

The statement "$p$ has a uniform distribution" is called the "prior" distribution of $p$. It carries information and so to ignore it will usually lead to a less efficient estimator (unless the prior is wrong, then the estimator could be badly biased).

The MLE is totally fine if we don't have a prior (or don't believe it).

In Bayesian statistics we are essentially assuming that the model we are trying to perform inference on was drawn from some larger distribution of possible models. In practice what this means is that the "fixed parameters" are pushed up one level to hyperparameters.

In this case, $p$ follows prior distribution $f_0(p) = \text{Uniform}(0,1)(p)$ so the hyperparameters are $0$ and $1$ for the endpoints of the interval. In classic Bayesian inference we don't try to infer these (or we do that before the experiment). There is a branch called Empirical Bayes where you do infer the hyperparameters from the data. But let's keep it simple for this problem.

The full probability model for observing $x$ Heads and $y$ tails then becomes: $$P(x,y)=\int_0^1f_0(p)P(x,y|p)dp$$

Once we observe our value $x=0,y=5$ in your case, we can get the conditional distribution of $p$ which will adjust the probability distribution of $p$ to be consistent with the observed data.

$$f(p|H^xT^y)=\frac{f(p,x,y)}{P(x,y)}=\frac{f_0(p)P(x,y|p)}{\int_0^1f_0(p)P(x,y|p)dp}$$

$f(p|H^xT^y)$ is called the posterior distribution of p and reflects the "updated prior" based on our observations. It is the distribution of $p$ that is consistent with the data (so some $p$ values are rendered less likely and others more)