Understanding the derivation of automatic relevance determination

157 Views Asked by At

I'm studying about automatic relevance determination from this paper (see pages 2-3) and I have problem in understanding the following part (on page 3 of the paper):

The posterior probability over all the unknown parameters, given the data, is expressed as $P(\textbf{w}, \boldsymbol\alpha, \sigma^2|\textbf{t})$. We are trying to find the $\textbf{w}, \boldsymbol\alpha$ and $\sigma^2$ which maximise this posterior probability. We can decompose the posterior: $$P(\textbf{w}, \boldsymbol\alpha, \sigma^2|\textbf{t})= P(\textbf{w}| \textbf{t}, \boldsymbol\alpha, \sigma^2)P(\boldsymbol\alpha, \sigma^2|\textbf{t})\;\;\;\;\;\;\;\; (1.1)$$ Substituting $\beta$ for $\sigma^2$ to make the maths appear less cluttered, the first part of $(1.1)$ can be expressed:

$$P(\textbf{w}| \textbf{t}, \boldsymbol\alpha, \beta) \sim N(\textbf{m}, \boldsymbol\Sigma)\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (1.2)$$ where the mean $\textbf{m}$ and the covariance $\boldsymbol\Sigma$ are given by: $$\textbf{m} = \beta\boldsymbol\Sigma\boldsymbol\Phi^T\textbf{t}\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (1.3)$$ $$\boldsymbol\Sigma=(\textbf{A}+\beta\boldsymbol\Phi^T\boldsymbol\Phi)^{-1} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (1.4)$$ and $\textbf{A} = diag(\boldsymbol\alpha)$. The method for arriving at $(1.2), (1.3)$ and $(1.4)$, relating to conditional Gaussian distributions, lies outside the scope of this document.

What I'm interested in, is how the $(1.2), (1.3)$ and $(1.4)$ are derived.

My question is: How does one explicitly arrive into $(1.2), (1.3)$ and $(1.4)$? Any references?

You can find all the details from the link I provided, thank you!

1

There are 1 best solutions below

0
On BEST ANSWER

The answers to my questions can be found from the book:

Pattern Recognition and Machine Learning.

Look in the sections dealing with multivariate Gaussian distributions.