Consider a Bernoulli random variables m and k given by counting heads(m) and tails(k) when flipping a coins N-times, N=(m+k), according to probability distribution:
$$Ber(m,k~|~\mu)=\mu^m~(1-\mu)^k$$
The prior distribution for $\mu$ is given by the beta distribution:
$$Beta(\mu~|~a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\mu^{a-1}~(1-\mu)^{1-b}$$
Show that the posterior mean value of $\mu$ lies between the prior mean and the maximum likelihood estimate for $\mu$.
To do this, show that the posterior mean can be written as $\lambda$ times the prior mean plus $(1-\lambda)$ times the maximum likelihood estimate, where $0 \lt \lambda \lt 1$.
This illustrates the concept of the posterior distribution being a compromise between the prior distribution and the maximum likelihood solution.
How to get started on this one?
(Textbook in question: "Pattern Recognition and Machine Learning", Christopher M. Bishop, 8th printing, 2006, page 129, exercise 2.7)
$a_0$ and $b_0$ represent the weights for number of heads($a_0$) and tails($b_0$) that we guess initially for parameterizing beta distribution $Beta(\mu, a_0, b_0)$ for initial guess about probability distribution (called the prior). The mean of the prior distribution is given by formula for mean of Beta Distribution:
$$\mu_{prior} =\frac{a_0}{a_0+b_0}$$
After N flips of coin, we use this additional knowledge to obtain a MLE (Maximum likelihood estimate) based on what we observed, where $a_1$ is the number of heads observered, and $b_1$ is the number of tails observed. Thus, the MLE mean is:
$$\mu_{ML} = \frac{a_1}{a_1+b_1}$$
We want to prove that the posterior mean:
$$\frac{(a_0+a_1)}{(a_0+a_1+b_0+b_1)}$$
which was created from knowledge of the prior mean and MLE mean lies between MLE mean:
$$\mu_{ML} = \frac{a_1}{a_1+b_1}$$
and the prior mean:
$$\mu_{prior} = \frac{a_0}{a_0+b_0}$$
Given the fact that:
PosteriorMean = Lambda x PriorMean + (1-Lambda) x MleMean
Or:
$$\frac{a_0+a_1}{a_0+a_1+b_0+b_1} = \lambda \frac{a_0}{a_0+b_0} + (1-\lambda)\frac{a_1}{a_1+b_1}$$