Do prior hyperparameters update as you take successive measurements in the case of Gaussian unknown mean?

215 Views Asked by At

I am trying to use conjugate priors to estimate the mean $\mu$ of a Gaussian with known variance, $\sigma^2$. Derived was that the choice of prior should be:

$p(\mu) = N(\mu | \mu_0, \sigma_0^2)$

Following through with this, I come up with an estimate for $\mu$:

$\mu_N = \frac{\sigma^2\mu_0}{N\sigma_0^2+\sigma^2} + \frac{N\sigma_0^2\mu_{ML}}{N\sigma_0^2+\sigma^2}$

In this case, $\mu_{ML} = \frac{1}{N}\sum x_i$, which I have.

Now my question is, I did another example fairly similar concerning the Bernoulli distribution, where the hyperparameters 'updated' with each successive measurement. The notation of the references I've been using is a little bit shady on this but I'm wondering if the same applies here. In particular, the posterior of a previous element yields the new hyperparameters for the prior of the next measurement.

In this one, though, it seems like you just pick hyperparameters and stick with them, $\mu_0, \sigma_0^2$. There doesn't seem to be any indication that the next prior has a different $\mu_0$ than the previous one. Is this the case? Am I skipping that step and just using the $\mu_N$ equation to sequentially estimate $\mu$, or are the hyperparameters in fact changing with each new measurement taken?

2

There are 2 best solutions below

0
On BEST ANSWER

I've checked with some sources and done some stuff in MATLAB to confirm results. Updating the prior parameters regularly seems to give better results, but it costs more computationally and keeping a static $\mu_0$ works just as well if not better. In the case of smaller values of N, there is preference given to the prior, for larger N the preference is given to the $\mu_{ML}$. Even if I was to keep changing the prior then, updating it regularly, for large N it would be meaningless, since the result converges to $\mu_{ML}$. So for large N, which is the case for me, the result is mostly the same. For smaller N, regularly changing $\mu_0$ is something you can consider, I suppose. Not sure how correct that is.

1
On

In the prior the hyperparameters are $\mu_0$ and $\sigma_0^2$. Given that you've got the correct $\mu_N$ you've probably already done most of the following:

We have the prior

$$p(\mu)\propto \exp\left(\frac{(\mu-\mu_0)^2}{2\sigma_0^2}\right)$$

Given we observe $(x_1,\dots,x_N)$ the likelihood is

$$L(x|\mu)\propto\prod_{i=1}^N\exp\left(\frac{(\mu-x_i)^2}{2\sigma^2}\right)=\exp\left(\sum_{i=1}^N\frac{(\mu-x_i)^2}{2\sigma^2}\right)$$

So by Bayes

$$\begin{align*} p(\mu|x)&\propto \exp\left(\frac{(\mu-\mu_0)^2}{2\sigma_0^2}\right) \exp\left(\sum_{i=1}^N\frac{(\mu-x_i)^2}{2\sigma^2}\right)\\ &\propto\dots\\ &\propto\exp\left(-\frac{1}{2}\left(\left(\frac{1}{\sigma_0^2}+\frac{N}{\sigma^2}\right)\mu^2-2\left(\frac{\mu_0}{\sigma_0^2}+\frac{N\mu_{ML}}{\sigma^2}\right)\mu\right)\right)\\ &\propto\exp\left(-\frac{1}{2}\left(\frac{1}{\sigma_0^2}+\frac{N}{\sigma^2}\right)\left(\mu^2-2\mu_N\mu\right)\right)\\ \end{align*}$$ where $\mu_N$ is the estimate you already worked out. Completing the square in this expression for the posterior gives

$$p(\mu|x)\propto\exp\left(-\frac{1}{2}\left(\frac{1}{\sigma_0^2}+\frac{N}{\sigma^2}\right)\left(\mu-\mu_N\right)^2\right)$$ This is a normal with mean $\mu_N$ and variance $$\left(\frac{1}{\sigma_0^2}+\frac{N}{\sigma^2}\right)^{-1}.$$ So you should use the new hyperparameters $\mu_N$ and $\left(\frac{1}{\sigma_0^2}+\frac{N}{\sigma^2}\right)^{-1}$.