Posterior beta from prior beta distribution (features binomial)

1.3k Views Asked by At

We have the following trial - group of people and we count males. A person is male with probability $\theta$. Now, we need to estimate this parameter using additional data. As a start, we have a beta distribution $Beta(\theta \mid a, b)$. I know that the number $m$ of males in a group is $Bin(m \mid \theta, N)$. Also, the precise definition of $Beta$ is

$$Beta(\theta|a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\theta^{a-1}(1-\theta)^{b-1}$$

I need to show that when we get additional evidence (like new sample), our beta distribution reflects this new evidence in its $a, b$ — in fact, we get a new beta from the old beta distribution.

Now, all the examples I found actually show the following:

$$Beta(\theta|a_1, b_1) = \binom{N}{m}\theta^m(1-\theta)^{N-m}*\frac{\Gamma(a_0+b_0)}{\Gamma(a_0)\Gamma(b_0)}\theta^{a_0-1}(1-\theta)^{b_0-1},$$

but none of them explain WHY and HOW they get the $Beta$ on the left. I get that it is Bayes' rule, I get what happens in reality — the curve is closer and closer to real $\theta$ the more examples we have, but I don't get the algebra.

Of course, we can use simple power sums to leave only $\theta^{m+a_0-1}(1-\theta)^{N+b_0-m-1}$ — this is what other examples show. But what happens to binomial coefficient? What happens to Gamma functions? How can they just be left out? We know that posterior $Beta$ should be $$\frac{\Gamma(a_1+b_1)}{\Gamma(a_1)\Gamma(b_1)}\theta^{a_1-1}(1-\theta)^{b_1-1}.$$ Doesn't it require to show what $a_1$ and $b_1$ actually are in terms of $a_0$ and $b_0$, N and m?

Furthermore, what happens to the normalizing denominator of Bayes rule? It should actually be $Pr(m)$ (m is the number of men in sample) — how is it left out in all proofs and why authors simply write — its a constant "we leave it out"? Doesn't it directly influence the result we get as our posterior? And, in this particular case, shouldn't we also show that this denominator contributes to some part of new Beta formula?

1

There are 1 best solutions below

1
On BEST ANSWER

Your two questions are essentially the same

The fast reply is that we can ignore multiplicative constants until the very end, as they are only really necessary to ensure that the integral of the posterior distribution over its support is $1$. In a sense this is what the "normalizing denominator" does

So in the case of a posterior density $p(\theta\mid m) \propto \theta^{m+a_0-1}(1-\theta)^{N+b_0-m-1}$ with the support $\theta \in [0,1]$, the normalising denominator must be $$\int_0^1\theta^{m+a_0-1}\,(1-\theta)^{N+b_0-m-1}\, d\theta = B(m+a_0,N+b_0-m)$$ to give a probability density which integrates to $1$

A slower reply providing a justification for this comes from using $$p(\theta\mid m) = \dfrac{\Pr(m\mid \theta)\, p(\theta)}{\Pr(m)}$$

which involves finding $\Pr(m)$ explicitly using

$$\Pr(m)=\int_{\theta=0}^1 \Pr(m\mid \theta)p(\theta) d\theta = \int_{\theta=0}^1 \binom{N}{m}\theta^m(1-\theta)^{N-m} \frac{\Gamma(a_0+b_0)}{\Gamma(a_0)\Gamma(b_0)}\theta^{a_0-1}(1-\theta)^{b_0-1}\,d\theta $$

so

$$p(\theta\mid m) = \dfrac{ \binom{N}{m}\theta^m(1-\theta)^{N-m} \frac{\Gamma(a_0+b_0)}{\Gamma(a_0)\Gamma(b_0)}\theta^{a_0-1}(1-\theta)^{b_0-1}\, }{\int_{\theta=0}^1 \binom{N}{m}\theta^m(1-\theta)^{N-m} \frac{\Gamma(a_0+b_0)}{\Gamma(a_0)\Gamma(b_0)}\theta^{a_0-1}(1-\theta)^{b_0-1}\,d\theta } $$

(if you will excuse using $\theta$ as a free variable in the numerator and a bound variable in the denominator) though, if you then cancel identical constants in the numerator and denominator, this changes to

$$p(\theta\mid m) = \dfrac{ \theta^m(1-\theta)^{N-m} \theta^{a_0-1}(1-\theta)^{b_0-1}\, }{\int_{\theta=0}^1 \theta^m(1-\theta)^{N-m} \theta^{a_0-1}(1-\theta)^{b_0-1}\,d\theta } $$ which is precisely the calculation used above for the normalizing denominator