Normal distribution with unknown mean and variance

4.6k Views Asked by At

I'm reading about Bayesian Data Analysis by Gelman et al. and I came across with an example which starts like this:

Example. Normal distribution with unknown mean and variance.

We illustrate the approximate normal distribution with a simple theoretical example. Let $y_1, . . . , y_n$ be independent observations from a $N(\mu, \sigma^2)$ distribution, and, for simplicity, we assume a uniform prior density for $(\mu, \log \sigma)$. We set up a normal approximation to the posterior distribution of $(\mu, \log \sigma)$, which has the virtue of restricting $\sigma$ to positive values. To construct the approximation, we need the second derivatives of the log posterior density,

$$\log p(\mu, \log\sigma \,|\, y) = constant − n \log \sigma − \frac{1}{2\sigma^2} \left((n − 1)s^2 + n(\overline{y} − μ)^2\right),$$

where $\overline{y}=\frac1n\sum_{i=1}^n y_i$ and $s^2=\frac{1}{n-1}\sum_{i=1}^n(y_i-\overline{y})^2$.

This log of the posterior is escaping me for some reason...I'm not fully following how the author arrived in the expression above. I did try to calculate the posterior as the product of likelihood and prior but I didn't get the same result.

Is there a change of variables etc. happening under the hood in this example or what?

In summary my question is: How do we arrive to the log posterior expression? Steps?

Thank you for your help!

2

There are 2 best solutions below

6
On BEST ANSWER

For this you start by writing out your loglikelihood $$ \begin{align} \log p(y|\mu,\sigma) &= \sum_i \log (y_i | \mu,\sigma) \\ &= \sum_i \left(-\frac{(y_i-\mu)^2}{2\sigma^2} - \log(\sqrt{2\pi\sigma^2}) \right)\\ &= -n\log \sigma - \frac{1}{2\sigma^2}\sum_i(y_i-\mu)^2 + \mbox{constant}. \end{align} $$ Now what happens next is the following decomposition of the sum of squares term $$ \begin{align} \sum_i (y_i - \mu)^2 &= \sum_i \left( y_i - \bar{y} + \bar{y} - \mu)\right)^2 \\ &= \sum_i \left((y_i-\bar{y})-(\mu-\bar{y})\right)^2, \end{align} $$ you might like to go from here?

Ok if you are happy with that rearrangement of the likelihood then looking at the Bayesian aspects we are looking for the posterior $$ p(\mu,\log \sigma | y) \propto p(y|\mu,\log \sigma) \pi(\mu,\log \sigma) $$ where $\pi(\cdot)$ is our prior, now this is assumed to be uniform which will also sometimes be given like $$ \pi(\log \sigma) \propto 1, $$ now in specifying a uniform prior there is a little bit of handwaving going on - this is an improper prior if I want $\log \sigma$ to have an infinite support - I'm sure Gelman discusses this more in the book, a few releavent points are that improper priors can still lead to proper posteriors and we can compactify the infinite support if we want. Anyway assuming the $\mu$ also has an (independently of $\log\sigma$) uniform prior then taking the log of the posterior I have $$ \begin{align} \log p(\mu,\log \sigma | y) &\propto \log p(y | \mu,\log \sigma ) + \log \pi(\log \sigma) + \log \pi(\mu) \\ &\propto \log p(y|\mu,\log \sigma) \end{align} $$ Where to get the second line we used that both priors are proportional to a constant. Now as I mentioned as it stands this $\log \sigma$ on the righthand side is just a reparameterisation of the usual log-likelihood term which as far as writing this term out makes no difference, where it *will* make a difference is when we proceed to carry out differentiation to construct an approximation and so we will be differentiating with respect to $\log \sigma$ and not $\sigma$.

Finally why carry out this change of variables? Well I suspect what the author is going to do is construct a Laplacian approximation which amounts to a Gaussian approximation around the posterior mode, now if we were to construct this in the original $(\mu,\sigma)$ then the Guassian has support $\mathbb{R}^2$ and so will give positive probability to negative variances.

6
On

Based on the answer by @Nadiels I'm trying to go through the math to arrive in the same log posterior as the author did in my question. I will continue from the answer given by @Nadiels:

So we have arrived at:

$$\log p(y|\mu,\sigma^2)= \text{constant}-n\log \sigma-\frac{1}{2\sigma^2}\sum_{i=1}^n ((y_i-\overline{y})-(\mu-\overline{y}))^2$$

$$=\text{constant}-n\log \sigma-\frac{1}{2\sigma^2}\sum_{i=1}^n \color{red}{ (y_i-\overline{y})^2}-\color{green}{2(y_i-\overline{y})(\mu-\overline{y})}+\color{blue}{(\mu-\overline{y})^2}$$

$$=\text{constant}-n\log \sigma-\frac{1}{2\sigma^2}\left(\color{red}{\sum_{i=1}^n (y_i-\overline{y})^2}-\color{green}{2(\mu-\overline{y})\sum_{i=1}^n(y_i-\overline{y})}+ \color{blue}{n(\overline{y}-\mu)^2} \right)$$

$$=\text{constant}-n\log \sigma-\frac{1}{2\sigma^2}\left(\color{red}{(n-1)s^2}-\color{green}{2(\mu-\overline{y})(n\overline{y}-n\overline{y})}+ \color{blue}{n(\overline{y}-\mu)^2} \right)$$

$$=\text{constant}-n\log \sigma-\frac{1}{2\sigma^2}\left((n-1)s^2+ n(\overline{y}-\mu)^2 \right),$$

so we got the same result as Gelman did in the book. Now this result indicates that we should have $p(y|\mu,\sigma^2)=p(y|\mu, \log \sigma)$. However, this is somewhat unclear to me since:

$$p(y|\mu,\sigma^2)=\prod_{i=1}^n\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{1}{2\sigma^2}(y_i-\mu)^2\right)=\frac{1}{(2\pi\sigma^2)^{n/2}}\exp\left(-\frac{1}{2\sigma^2}\sum_{i=1}^n(y_i-\mu)^2\right),$$

and

$$p(y|\mu,\log\sigma)=\prod_{i=1}^n\frac{1}{\sqrt{2\pi\log\sigma}}\exp\left(-\frac{1}{2\log\sigma}(y_i-\mu)^2\right)$$

$$=\frac{1}{(2\pi\log\sigma)^{n/2}}\exp\left(-\frac{1}{2\log\sigma}\sum_{i=1}^n(y_i-\mu)^2\right),$$

So now to me it seems that $p(y|\mu,\sigma^2)\neq p(y|\mu,\log\sigma)$. Take for example $\sigma=e,$

$$p(y|\mu,e^2)=\frac{1}{(2\pi e^2)^{n/2}}\exp\left(-\frac{1}{2e^2}\sum_{i=1}^n(y_i-\mu)^2\right)$$

$$p(y|\mu,\log e)= p(y|\mu,1)=\frac{1}{(2\pi)^{n/2}}\exp\left(-\frac{1}{2}\sum_{i=1}^n(y_i-\mu)^2\right)$$