Showing that the posterior distribution takes the same form as the prior in probabilistic linear regression without fixed precision

550 Views Asked by At

In Christopher M. Bishop's Pattern Recognation and Machine Learning it has been stated that when we have an unknown mean and unknown precision, the conjugate prior is a normal-gamma distribution

Let's assume that if our likelihood is as follows:

$p(y | Φ, w, β) = \prod_{i=1}^N $N$(y_i| w^T Φ(x_i), β^{−1})$

Therefore the conjugate prior for both $w$ and $β$ should be

$p(w|β)=$N$(w| m_0, β^{−1}S_0)Gamma(β|a_0,b_0)$

How can I show that the posterior distribution takes the same form as the prior? In other words how can I prove:

$p(w,β|D)=$N$(w| m_N, β^{−1}S_N)Gamma(β|a_N,b_N)$

I believe giving expressions for $m_N , S_N , a_N ,$ and $b_N$ could make this easier. Yet normal and gamma distributions are very complex to me unfortunately. Any help would be appreciated.

1

There are 1 best solutions below

2
On BEST ANSWER

I don't think it's necessary to reproduce the derivation here because it is just standard but long computations! You can find a complete derivation here

https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf

If there's anything that is not clear. I'm happy to explain!

The model is, I believe, $y = m + \epsilon$, where $m$ is the mean function and $\epsilon$ is the Gaussian error. The mean function is a unknown function of input $x_i$, $i=1,\dots,n$, and some parameter $w$. In your particular setting, your mean function is some sort of linear combination of some basis function $\phi$ of your data. As a simple example $\phi_i(x) = x^i$, then $m(w,x) = w_1x+w_2x^2+\dots$. You need to estimate the parameter $w$. So then it's equivalent to working with transformed data, say originally you have $x_{11},x_{12},\dots$, $x_{21},x_{22},\dots$, now you have $\phi(x_{11}),\phi(x_{12}),\dots$, $\phi(x_{21}),\phi(x_{22}),\dots$ and then estimate the $w$ exactly the same as when you estimate the $b$ in linear regression. So when you do inference on $w$, you put a prior on it, in this case a Normal prior, now you have a model for the mean $m(w,x)$. Recall $\epsilon$ is Gaussian error, its variance is also need to be estimated, that's why you use NIG prior.

In a standard problem you have $y_1,y_2,\dots$ from a Gaussian distribution with some mean and variance, and you want to estimate them. In your case, in addition to $y$, you also have predictors $x$, the mean in the standard problem is now some linear combination of the $x$, now you need to estimate those $w$ (how those $x$ are combined). In other words, if you estimate the mean as in the standard problem and know that the mean follows some parametric form ($w^T\phi(x)$), then you can solve for $w$.

If you have specific questions I might be able to provide better help.