Fair coin toss and Bayes

4.2k Views Asked by At

You have a coin and your prior assumption is that its probability of heads $\theta$ is chosen from a uniform distribution on $[0, 1]$. You toss the coin 10 times and get 6 heads. What is the estimate of $\theta$?

I figured that it has to be $\frac{6}{10}$ but is there a theorem or rule that can upend my guess?

3

There are 3 best solutions below

0
On BEST ANSWER

You stated "Bayes" in the title of your question; therefore, the posterior estimate of $\theta$ is not a single value, but a distribution.

With a binomial likelihood, the beta distribution is a conjugate prior. That is to say, if $$\theta \sim \operatorname{Beta}(a,b),$$ and $$X \mid \theta \sim \operatorname{Binomial}(n, \theta),$$ the posterior density is $$\theta \mid X \sim \operatorname{Beta}(a+X, b+n-X).$$ In your case, a uniform prior corresponds to the hyperparameters $a = b = 1$, and we observed for $n = 10$ the result $X = 6$. Hence the posterior for $\theta$ is beta distributed with posterior hyperparameters $a^* = 1+6 = 7$ and $b^* = 1+10-6 = 5$, and has density $$f_{\theta \mid X}(\theta) = 2310 \, \theta^6 (1-\theta)^4 \mathbb 1 (0 < \theta < 1).$$

The mode of this posterior occurs at $\hat \theta = 3/5 = 0.6$, which is easily found by differentiation. This is also the frequentist maximum likelihood estimator (MLE). However, it is by no means the only meaningful point estimate that can be constructed from the posterior density; e.g., one could consider the expectation, which would be $$\operatorname{E}[\theta \mid X] = \frac{a^*}{a^* + b^*} = \frac{7}{12}.$$ Since you do not specify what type of point estimate you wish to construct, or even whether you want a point estimate at all (you could be intending to construct an interval estimate), it is perhaps best that, in the Bayesian context of the question, we stop at the computation of the posterior density.

1
On

I don't know of any pertinent probability theorem; my knowledge of probability is limited. Also, I have no opinion on whether Peter's comment is accurate. I would attack the problem as follows:

Suppose that $n$ is a fixed integer (e.g. 10). Further suppose that $x \,\in \{0, 1, 2, \cdots, n\}$ and assume that the chance of a heads on any one toss is $x/n.$ Calculate the probability of getting exactly 6 heads out of 10 tosses as a function of $x.$ Denote this probability as $P_n(x).$

Then, Bayes theorem should kick in. That is, let $F_n(x)$ denote the (relative) probability that the chance of a heads on any one toss is $x/n.$ Then $F_n(x) = \frac{P_n(x)}{P_n(0) + P_n(1) + \cdots + P_n(n)}.$ This will give you the relative value of $F_n(x).$

I presume that the interpretation of "estimated probability for chance of a heads" is interpreted as $\sum_{x=0}^n \,\frac{x}{n}F_n(x).$

The above is a key point, where I may well be mistaken. If so, someone, please advise.

Assuming that everything (so far) in my answer is accurate, then the ultimate answer should simply be the limit as $n \to \infty$ of the estimated probability, as a function of $n.$

It wouldn't surprise me if the calculation (as $n \to \infty$) results in the summation of an infinite # of very small ranges being replaced by an integration formula.

0
On

One way to solve this is by using Maximum Likelihood Estimation. From Wikipedia (emphasis mine):

From the vantage point of Bayesian inference, MLE is a special case of maximum a posteriori estimation (MAP) that assumes a uniform prior distribution of the parameters (in our case, $\theta$).

Let $x_i$ denote the outcome (head = 1 / tails = 0) in a coin flip and let $X_n = (x_1, x_2, \dots, x_n)$ be a sequence of $n$ flips. We have that $m = \sum_{i=1}^n x_i$ is the number of heads in $n$ flips.

By Bayes' Theorem, we have that: $$ f(\theta|X_n) = \frac{f(X_n|\theta) \cdot f(\theta)}{f(X_n)} $$

Assuming $\theta \sim \mathsf{Uniform}(0,1)$ we get: $$ f(\theta|X_n) \propto f(X_n|\theta) $$

This means that the most probable $\theta$ (i.e. MAP) that explains the sequence $X_n$ can be found by maximizing the likelihood $f(X_n|\theta)$ (i.e. MLE).

The likelihood of observing sequence $X_n$ given that the coin is parametrized by $\theta$ is: $$ f(X_n|\theta)=\theta^m \cdot (1-\theta)^{n - m} $$

Now, the maximum likelihood estimator of $\theta$ is: $$ \widehat{\theta}_{\text{MLE}}(X_n) = \arg\max_\theta \big( f(X_n | \theta) \big) = \arg\max_\theta \big( \log f(X_n | \theta) \big) $$

Note that maximizing $f(X_n|\theta)$ is the same as maximizing $\log f(X_n|\theta)$ since $\log$ is monotonously increasing, and in this setting it's much more convenient to work with $\log$.

Therefore: $$ \log f(X_n|\theta) = \log \big(\theta^m \cdot (1-\theta)^{n - m} \big) = m \cdot \log \theta + \left(n-m\right) \cdot \log(1-\theta) $$

Now we can maximize: $$ \frac{\partial}{\partial \theta} \log f(X_n|\theta) = \frac{m}{\theta} - \frac{n-m}{1-\theta} = \frac{m\cdot(1-\theta)-\theta \cdot (n-m)}{\theta \cdot (1-\theta)} = 0 $$

Which finally gives: $$ \boxed{\theta = \frac{m}{n}} $$