What is relationship between beta and binomial distributions in Bayesian inference

264 Views Asked by At

I came across this question: Suppose we are giving two students a multiple-choice exam with 40 questions, where each question has four choices. We don't know how much the students have studied for this exam, but we think that they will do better than just guessing randomly. a: What is our likelihood? b: What prior should we use?

The solutions were: a: Likelihood is Binomial(40, theta); b: The conjugate prior is a beta prior.

Could someone please explain why beta is the conjugate prior of a binomial? I meant how could one know $\theta$ was distributed as $Beta$? Could other distributions be used for binomial likelihood, and what is the consequence of not using $Beta$ for the prior?

Thanks in advance.

3

There are 3 best solutions below

1
On BEST ANSWER

The following is a simple and intuitive explanation for your question:

Let's have a binomial distribution (the data model)

$$\mathbb{P}[X=x]=\binom{n}{x}\theta^{x}(1-\theta)^{n-x}$$

where the support is $x=0,1,2,...,n$

This is a DISCRETE rv where X is the variable, (n is known) and $\theta$ is a parameter $\in [0;1]$

Now let's change the point of view, looking at this distribution as a function (continuous function) in the variable $\theta$

As this is a function of $\theta$ we can first discard all the quantities that do not depend on $\theta$ getting

$$f(\theta|x)=\theta^{x}(1-\theta)^{n-x}=\theta^{(x+1)-1}(1-\theta)^{(n-x+1)-1}$$

Now $\theta$ is the variable and $x$ a parameter (observed data) and we recognize a Beta distribution

$$f(\theta|x) \propto Beta[(x+1);(n-x+1)]$$

Now I think it is very easy to verify that Beta is exactly the conjugate prior of Binomial Model

Important Observation: To identify the family of conjugate prior for a Statistical Model there is a very useful factorization theorem

Let's suppose that the model can be written in the following way:

$$\mathbb{p}(\mathbf{x}|\theta)=g[t(\mathbf{x}),n,\theta]\cdot\psi(\mathbf{x})$$

$\forall x,\theta$ and assuming that $g(\theta)$ is integrable on all $\Theta$

Then the family

$$\mathbb{\pi}(\theta)\propto g(s,m,\theta)$$

Is the conjugate prior.

Applying this theorem to the binomial model you immediately identify that

$$g(t,n,\theta)=\theta^{x}(1-\theta)^{n-x}$$

thus the conjugate prior must be of the form

$$\theta^{a}(1-\theta)^{b}$$

that is obviously the kernel of a Beta distribution (to ensure it is a denisty you have to multiply it by the normalization constant, of course, but it is not a problem as the beta distribution is a known density).

Here you can find a very useful table of the most common model with priors, posteriors, parameters and so on...

1
On

The point is that if the prior is a beta distribution and the likelihood comes from a binomial distribution, then the posterior is a again a beta distribution.

2
On

If $\Theta \sim Beta(\alpha,\beta)$ and $X|\Theta=\theta\sim B(n,\theta)$ then it turns out that $$\Theta|X=x \sim Beta(\alpha+x,\beta+n-x)$$ which you verify yourself by evaluating its density: $$f_{\Theta|X=x}(\theta|x)=\frac{f_{X|\Theta=\theta}(x|\theta)f_{\Theta}(\theta)}{\int_{0}^{1}f_{X|\Theta=\theta}(x|\theta)f_{\Theta}(\theta)d\theta}$$ Here we have $$f_{X|\Theta=\theta}(x|\theta)={n \choose x}\theta^x(1-\theta)^{n-x}$$ for $x=0,1,...,n$ while $$f_\Theta(\theta)=\frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha,\beta)}$$ for $\theta \in [0,1]$. You can, of course, assign alternative priors for $\Theta$, but the posterior density $\Theta|X=x$ may not belong to the same "family" of distributions.