How to calculate marginal likelihood for coin flip with and without prior?

272 Views Asked by At

I was given a problem where I need to "compare a simple and complex model by computing the marginal likelihoods" for a coin flip. There were $4$ coin flips, $\{d_1, d_2, d_3, d_4\}$. The "simple" model is the hypothesis that it is a fair coin, and $P(H) = 0.5$ . The "complex" hypothesis is $P(H) = \theta$ with prior $Beta(\theta|2,2)$ .

I was also given that the formula for marginal likelihood is $p(D|M) = \int_\theta p(D|\theta, M)p(\theta|M)d\theta$ .

A follow up question I am given is to decide which of these models provides a better account of the data $\{d_1, d_2, d_3, d_4\} = \{H, H, T, H\}$ .


What I have so far:

I will call the "simple" model $M_1$ and the "complex" one $M_2$. My intuition tells me that the marginal likelihood for the simple model is simply $p(D|M_1) = 0.5$, but I am not sure how to show mathematical calculation of this. For the "complex" model, I am not sure how to proceed. My guess is that $p(\theta|M_2) = \beta(\theta|2,2) = \frac{\Gamma(2+2)}{\Gamma(2)\Gamma(2)}\theta^{2-1}(1-\theta)^{2-1} = 6\theta(1-\theta)$, but I am not sure what $p(D|\theta, M_2)$ would be.

For the follow up question, my understanding is that I have to compute the maximum likelihoods and then compute the Bayesian Factor as $\frac{p(D|M_1)}{p(D|M_2)}$ .

1

There are 1 best solutions below

0
On BEST ANSWER

For the simple model, $M_1$, the parameter is given ($0.5$), so you do not need to consider the prior probability -- the marginal likelihood is simply the given value. That is how we can justify the marginal likelihood for $M_1$.

For the complex model, $M_2$, the likelihood is the Bernoulli distribution with respect to the prior, the given Beta distribution that has been parameterized by the hyper-parameters, $(2, 2)$. Using the formula for marginal likelihood,

\begin{align}\int_\theta p(D \mid \theta, M_2) p(\theta \mid M_2) d\theta &= \int_\theta \mathrm{Bernoulli}\bigl(p(D \mid \theta)\bigr)Beta(\theta \mid 2,2)d\theta \\ &= \int_\theta \theta^{N_H}(1-\theta)^{N_T}6\theta(1-\theta)d\theta~.\end{align}

Where $N_H, N_T$ = Number of heads and tails.

Given 'HHTH', $$\int_\theta \theta^{N_H}(1-\theta)^{N_T}6\theta(1-\theta)d\theta = \int_\theta \theta^{3}(1-\theta)^{1}6\theta(1-\theta)d\theta = \int_{0}^{1} 6\theta^{4}(1-\theta)^{2}d\theta = \frac{2}{35}~.$$


Now, to select the best model, we can use the Bayes factor,

$$\frac{p(D \mid \theta, M_1)}{p(D \mid \theta, M_2)} = \frac{0.5}{2/35} = 8.75 \implies M_1$$ is the better model.