Help Understanding This Linear Algebra

188 Views Asked by At

I am working through a paper here and trying to understand all of the linear algebra involved, but I get stuck going from equations 5-6 to equation 7. I will skip the bold matrix notation because everything is a matrix/vector

Equation 5

$$ p(y|X, \bar{X}, \bar{f}) = \mathcal{N}(y|K_{NM}K_{M}^{-1}\bar{f}, \Lambda + \sigma^2I) $$

Equation 6

$$ p(\bar{f}|\bar{X}) = \mathcal{N}(\bar{f}|0, K_{M}) $$

Equation 7

The paper says "we find the posterior distribution over pseudo targets $\bar{f}$ using Bayes' rule on (5) and (6)."

$$ p(\bar{f}|\mathcal{D}, \bar{X}) = \mathcal{N}(\bar{f}|K_MQ^{-1}_MK_{MN}(\Lambda + \sigma^2I)^{-1}y, K_MQ_M^{-1}K_M) $$

where

$$ Q_M = K_M + K_{MN}(\Lambda + \sigma^2I)^{-1}K_{MN} $$

By saying that they used Bayes' rule, they are saying that they are doing the following...

$$ p(\bar{f}|\mathcal{D}, \bar{X}) = \frac{p(y|X, \bar{X}, \bar{f})p(\bar{f}|\bar{X})}{p(y)} $$

But I do not see where they can get the denominator from as it is not given. I tried multiplying the likelihood by the prior in the numerator using these Gaussian identities, but I go something totally different, not even close to the definition of $p(\bar{f}|\mathcal{D}, \bar{X})$. How can you combine the equations 5 and 6 with Bayes' rule to get equation 7?

EDIT: WIP to get details final derivation correct

Point 1

I am now very close to getting the full thing, but there is one part that is eluding me, which is probably something trivial. For one, the proportions have different x values, so I think there needs to be some subscript to set the variables apart, like so...

$$ \exp(a(x_1-b)^2) \exp(cx_2^2) \propto \exp((a+c)x_2^2 - 2abx_1).$$

which would mean that $1 \rightarrow 2$ could be represented as...

$$ \begin{align} & \propto \exp(-\frac{1}{2} (y - L \mu)^T \Sigma^{-1}(y - L \mu))\exp(-\frac{1}{2}\mu^T K^{-1} \mu) \\ &\propto \exp(-\frac{1}{2}(\underbrace{-2\mu^T L^T \Sigma^{-1}y + \mu^T(\Sigma^{-1}+K^{-1})\mu}_\text{Terms that do not include $\mu$ are factorized away})) \\ \end{align} $$

if it is the case that...

$$ \begin{aligned} a &= \Sigma^{-1} \\ b &= \mu^TL^T \\ c &= K^{-1} \\ x_1 &= y \\ x_2 &= \mu \\ \end{aligned} $$

But I do not see how we would end up with $L^T\Sigma^{-1}L + K^{-1}$ inside of the parentheses at the end.

Point 2

I think the final line of the derivation is missing an inverse at the end, is it correct?

$$ \propto \exp(-\frac{1}{2}\left(\mu - (L^T\Sigma^{-1}L+K^{-1})^{-1} L^T \Sigma^{-1} y\right)^T (L^T\Sigma^{-1}L+K^{-1})^{\color{red}{-1}} \left(\mu - (L^T\Sigma^{-1}L+K^{-1})^{-1} L^T\Sigma^{-1} y\right)). $$

1

There are 1 best solutions below

10
On BEST ANSWER

Preamble

I shall use the common Bayesian statistics trick used in figuring out the posteriors - work with PDF kernels and figure out the normalization constants later. I will also make an intensive use of the proportionality arithmetic, $\propto$, which simplifies the derivations.

The main logic of $\propto$ relation is expressed as:

$$ C_1\exp(C_2 + h(x)) \propto C_3\exp(C_4 + h(x)) \propto \exp(h(x))$$

for any constants $C_i$ and $h(\cdot)$ is some function of $x$.

Obviously $f(x) \propto h(x)$ implies $h(x) \propto f(x)$.

The following relations are relatively not obvious but will be useful below:

$$ \exp(a(x-b)^2) \propto \exp(ax^2 - 2ab x) \propto \exp(ax(x-2b)),$$ $$ \exp(a(x-b)^2) \exp(cx^2) \propto \exp((a+c)x^2 - 2ab x).$$

Tedious algebra

So, we have normally distributed data $y|\mu \sim \mathcal{N}(L \mu, \Sigma), \mu\in\mathbb{R}^n$ and normally distributed mean parameter $\mu \sim \mathcal{N}(0, K)$. The posterior density of $\mu$ is proportional to the product of PDF kernels of the two:

\begin{align} \overline{p}(\mu|y) \quad & \propto \exp(-\frac{1}{2} (y - L \mu)^T \Sigma^{-1}(y - L \mu))\exp(-\frac{1}{2}\mu^T K^{-1} \mu) \\ &\propto \exp(-\frac{1}{2}(\underbrace{-2\mu^T L^T \Sigma^{-1}y + \mu^T(L^T \Sigma^{-1}L+K^{-1})\mu}_\text{Terms that do not include $\mu$ are factorized away})) \\ &\propto \exp(-\frac{1}{2} \underbrace{\mu^T(L^T\Sigma^{-1}L+K^{-1})}_\text{Factorized inside $\exp$} \left(-2 (L^T\Sigma^{-1}L+K^{-1})^{-1} L^T\Sigma^{-1} y + \mu\right))\\ &\propto \exp(-\frac{1}{2}\left(\mu - (L^T\Sigma^{-1}L+K^{-1})^{-1} L^T \Sigma^{-1} y\right)^T (L^T\Sigma^{-1}L+K^{-1}) \left(\mu - (L^T\Sigma^{-1}L+K^{-1})^{-1} L^T\Sigma^{-1} y\right)). \end{align}

Careful examination of the last line reveals the PDF kernel of a normal rv

$$\mu|y \sim \mathcal{N}((L^T\Sigma^{-1}L+K^{-1})^{-1} L^T\Sigma^{-1} y, (L^T\Sigma^{-1}L+K^{-1})^{-1}). $$

Using the Woodbury identity one can rewrite the posterior covariance with an eye on simplifying it in the context of the paper:

$$ (L^T\Sigma^{-1}L+K^{-1})^{-1} = K - K L^T(\Sigma + L K L^T)^{-1} L K $$

In the context of the paper,

\begin{align} \mu &= \overline{f}, \\ \Sigma &= \Lambda + \sigma^2 I, \\ K &= K_M,\\ L &= K_{NM}K_M^{-1}, \\ KL^T &= K_{MN}, \\ L K L^T &= K_{NM} K_M^{-1} K_{MN}, \end{align}

and the posterior covariance matrix equals

\begin{align} &K_M - K_{MN} (\Lambda + \sigma^2I + K_{NM} K_M^{-1} K_{MN} )^{-1} K_{NM} \\ =& K_M (K_M^{-1} - K_M^{-1} K_{MN} (\Lambda + \sigma^2I + K_{NM} K_M^{-1} K_{MN} )^{-1} K_{NM} K_M^{-1} ) K_M\\ =& K_M (K_M + K_{MN} (\Lambda + \sigma^2 I)^{-1}K_{NM})^{-1} K_M \\ = & K_M Q_M^{-1}K_M. \end{align}

(Woodbury identity is used again to get the third line.)

The mean of the posterior is obtained in a similar fashion.