Bayesian Statistics : Motivation and Explanation of Marginal Likelihood

244 Views Asked by At

$P(\theta|x)$ is the posterior probability. It describes $\textbf{how certain or confident we are that hypothesis $\theta$ is true, given that}$ we have observed data $x$.

Calculating posterior probabilities is the main goal of Bayesian statistics!

$P(\theta)$ is the prior probability, which describes $\textbf{how sure we were that}$ $\theta$ was true, before we observed the data $x$.

$P(x|\theta)$ is the likelihood. $\textbf{If you were to assume that $\theta$ is true, this is the probability}$ that you would have observed data $x$.

$P(x)$ is the marginal likelihood. This is the probability that you would have observed data $x$, whether $\theta$ is true or not.

So, $P (\theta|x) = \frac{P (\theta) P(x|\theta)}{P (x)}$

Now I don't undergraduate completely what $P(x)$ is the marginal likelihood is, could anyone plz explain with the motivation behind the term in simple language providing an example? Thanks you.

2

There are 2 best solutions below

2
On

According to Bayes Theorem,

$$p(\mathbf{x})=\int_{\Theta}p(\theta)p(\mathbf{x}|\theta)d \theta$$

But in a bayesian point of view it is only a normalisation constant (as it is integrated in $d\theta$ it does not depends anymore by $\theta$)

Example: let's suppose to have a coin and suppose we have no idea if it is fair or not...so our prior distribution of the parameter $\theta$ is uniform in $[0;1]$.

Let's suppose to flip 10 times the coin obtaining 6 heads.

The likelihood is

$p(\mathbf{x}|\theta)\propto \theta ^6(1-\theta)^4$

As the prior density is 1, the previous function is also the posterior, unless an specific constant to be calculated (your $p(\mathbf{x})$).

Without a lot of calculations, you inmediately recognize a beta distribution

$$p(\theta|\mathbf{x})\sim Beta (7;5)$$

Thus the constant is

$$\frac{\Gamma(5+7)}{\Gamma(5)\Gamma(7)}=\frac{11!}{4!6!}=2310$$

Of course you can get the esame result solving the integral

$$\int_0^1\theta^6(1-\theta)^4 d \theta=\frac{1}{2310}$$

0
On

I lost u after "the previous function is also the posterior... "... could not relate with beta distribution, plz try something else, simpler perhaps..

Let's focus on this function

$$ \bbox[5px,border:2px solid black] { P(X=x|\theta \in [0;1])=\binom{10}{x}\theta^x(1-\theta)^{10-x} \qquad (1) } $$

defined where $x=0,1,2,...,10$

I suppose you perfectly know what this function is....the probability (mass) function of a discrete Random Variable called "Binomial"

In a bayesian way of thinking, we have to change the "point of view" looking at this function as a function of $\theta$, after observing the value of X, number of successes in the experiment, say as an example to observe 6 Heads on 10 coin flips.

Now (1) becomes

$$ \bbox[5px,border:2px solid black] { P(\theta|X=6)=\binom{10}{6}\theta^6(1-\theta)^4 \qquad (2) } $$

First observation: to grant that (2) is a pdf, as $\theta \in[0;1]$ its integral on the entire support must be 1.

Second observation: the quantity $\binom{10}{6}$ is only a constant, so we can discard it (we have to find the exact constant letting (2) to be a pdf)

Third observation: If we unroll our known density function list, and specifically we look at the "Beta Distribution" we see that this density is the following

$Beta(a,b)=C x^{a-1}(1-x)^{b-1}$

$x \in [0;1]$

Where

$C=\frac{\Gamma(a+b)}{ \Gamma(a)\Gamma(b)}=\frac{(a+b-1)!}{(a-1)!(b-1)!}$

Fourth observation: the expression (2), avoiding the constant, is exactly the same function as a beta distribution, in particular it is a $Beta(7;5)=C\cdot \theta^{7-1}(1-\theta)^{5-1}$

Now I don't understand completely what $P(x)$ is the marginal likelihood is

In Bayesian Statistics, $P(x)=C$ is only a constant needed to normalize the parameter's law, only a number. Mostly the value of $C$ is determined without calculation via known densities.

I hope now it is clear....otherwise I surrend.