how to calculate the marginal distribution of probabilistic principal component analysis

1.1k Views Asked by At

In the book Pattern recognition and machine learning from Bishop equation 12.33 states:

$\mathbf{x} = \mathbf{W} \mathbf{z} + \boldsymbol\mu + \boldsymbol\epsilon$

Here $\mathbf{z}$ has a normal distribution $N(\mathbf{z} | \mathbf{0}, \mathbf{I})$ and $\boldsymbol\epsilon$ has a normal distribution $N(\boldsymbol\epsilon | \mathbf{0}, \sigma² \mathbf{I})$

Thus, the conditional probability $p(\mathbf{x}|\mathbf{z})$ is equal to

$p(\mathbf{x}|\mathbf{z}) = N(\mathbf{x}|\mathbf{W}\mathbf{z} + \boldsymbol \mu, \sigma^2 \mathbf{I})$

The marginal distribution $p(\mathbf{x})$ can then be computed with

$p(\mathbf{x}) = \int p(\mathbf{x}|\mathbf{z}) p(\mathbf{z}) d\mathbf{z}$

and according to the book this is equal to $N(\mathbf{x}| \boldsymbol\mu, \mathbf{C})$ with $\mathbf{C} = \mathbf{W} \mathbf{W}^T + \sigma^2 \mathbf{I}$. How did the author calculated the integral to come to this result?

1

There are 1 best solutions below

0
On BEST ANSWER

The mean and variance of the marginal density can be directly evaluated from the expression $\mathbf{x}=\mathbf{Wz}+\boldsymbol{\mu}+\boldsymbol{\epsilon}$.

Nevertheless, the more lengthy approach of evaluating the integral is a useful exercise in how to deal with integrals of exponential functions whose argument is a quadratic function of vectors, as is the case with the multivariate Normal distribution.

The objective is to calculate the integral $$\begin{align}p(\mathbf{x})=&\int \color{blue}{p(\mathbf{x}|\mathbf{z})}\color{red}{p(\mathbf{z})}d\mathbf{z}\\=&\int\color{blue}{\frac{1}{(2\pi)^{n/2}\sigma^n}\exp\left(-\frac{1}{2\sigma^2}(\mathbf{x}-\mathbf{Wz}-\boldsymbol{\mu})^T(\mathbf{x}-\mathbf{Wz}-\boldsymbol{\mu})\right)}\color{red}{\frac{1}{(2\pi)^{n/2}}\exp\left(-\frac{1}{2}\mathbf{z}^T\mathbf{z}\right)}d\mathbf{z}\end{align}$$ Expanding the arguments of the exponential functions, and after some rearrangement we end up with $$p(\mathbf{x})=\int\frac{1}{(2\pi)^n\sigma^n}\exp\left(-\frac{1}{2}(\mathbf{z}^T\left[\frac{1}{\sigma^2}\mathbf{W}^T\mathbf{W}+\mathbf{I}\right]\mathbf{z}-\frac{2}{\sigma^2}(\mathbf{x}-\boldsymbol{\mu})^T\mathbf{Wz}-\frac{2}{\sigma^2}\boldsymbol{\mu}^T\mathbf{x}\\+\frac{1}{\sigma^2}\mathbf{x}^T\mathbf{x}+\frac{1}{\sigma^2}\boldsymbol{\mu}^T\boldsymbol{\mu}\right)d\mathbf{z} \text{ Eq.(1)}$$ To aid in integration with respect to $\mathbf{z}$, we can conceptually regard the integrand as a Normal density in terms of random variable $\mathbf{z}$, with mean $\boldsymbol{\mu_z}$ and variance matrix $\mathbf{C_z}$, where the argument of the exponential function would be $$\exp\left(-\frac{1}{2}(\mathbf{z}-\boldsymbol{\mu_z})^T\mathbf{C_z}^{-1}(\mathbf{z}-\boldsymbol{\mu_z})\right)=\exp\left(-\frac{1}{2}(\mathbf{z}^T\mathbf{C_z}^{-1}\mathbf{z}-2\boldsymbol{\mu_z}^T\mathbf{C_z}^{-1}\mathbf{z}+\color{red}{\boldsymbol{\mu_z}^T\mathbf{C_z}^{-1}\boldsymbol{\mu_z}})\right)$$ Comparing the above expression with the integrand (Eq.($1$)), except for the term highlighted in red (which we will add and subtract from the integrand, similar to the method of completing the square), we can match the mean and variance to the appropriate terms in the integrand $$\begin{align} \mathbf{C_z}^{-1}&=\frac{1}{\sigma^2}\mathbf{W}^T\mathbf{W}+\mathbf{I}=\frac{1}{\sigma^2}(\mathbf{W}^T\mathbf{W}+\sigma^2\mathbf{I})\\ \boldsymbol{\mu_z}^T\mathbf{C_z}^{-1}&=\frac{1}{\sigma^2}(\mathbf{x}-\boldsymbol{\mu})^T\mathbf{W}\\\Rightarrow \color{red}{\boldsymbol{\mu_z}^T\mathbf{C_z}^{-1}\boldsymbol{\mu_z}}&=\frac{1}{\sigma^2}(\mathbf{x}-\boldsymbol{\mu})^T\mathbf{W}(\mathbf{W}^T\mathbf{W}+\sigma^2\mathbf{I})^{-1}\mathbf{W}^T(\mathbf{x}-\boldsymbol{\mu})\end{align}$$ Thus we have the following (where the $\mathbf{z}$ term is a Normal density, and so integrates to unity) $$p(\mathbf{x})\propto\exp\left(-\frac{1}{2}(-\color{red}{\boldsymbol{\mu_z}^T\mathbf{C_z}^{-1}\boldsymbol{\mu_z}}-\frac{2}{\sigma^2}\boldsymbol{\mu}^T\mathbf{x}+\frac{1}{\sigma^2}\mathbf{x}^T\mathbf{x}+\frac{1}{\sigma^2}\boldsymbol{\mu}^T\boldsymbol{\mu})\right)\int\mathcal{N}(\mathbf{z}|\boldsymbol{\mu_z},\mathbf{C_z})d\mathbf{z}\\=\exp\left(-\frac{1}{2}(-\frac{1}{\sigma^2}(\mathbf{x}-\boldsymbol{\mu})^T\mathbf{W}(\mathbf{W}^T\mathbf{W}+\sigma^2\mathbf{I})^{-1}\mathbf{W}^T(\mathbf{x}-\boldsymbol{\mu})-\frac{2}{\sigma^2}\boldsymbol{\mu}^T\mathbf{x}+\frac{1}{\sigma^2}\mathbf{x}^T\mathbf{x}+\frac{1}{\sigma^2}\boldsymbol{\mu}^T\boldsymbol{\mu})\right)\\=\exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\mathbf{C_x}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right)$$ which is in the form of a Normal distribution with respect to $\mathbf{x}$, where the mean is simply $\boldsymbol{\mu}$.

It remains to evaluate the variance matrix of $\mathbf{x}$, $\mathbf{C_x}$ where
$$\mathbf{C_x}^{-1}=\frac{1}{\sigma^2}(\mathbf{I}-\mathbf{W}(\mathbf{W}^T\mathbf{W}+\sigma^2\mathbf{I})^{-1}\mathbf{W}^T)$$ We use the Woodbury matrix identity, restated as follows (for matrices $\mathbf{A}$, $\mathbf{U}$, $\mathbf{C}$ and $\mathbf{V}$) $$\mathbf{(A + UCV)^{-1}=A^{-1}-U(C^{-1}+VA^{-1}U)VA^{-1}}$$ and setting $\mathbf{A}=\sigma^2\mathbf{I}$, $\mathbf{C}=\mathbf{I}$, $\mathbf{U=W}$ and $\mathbf{V=W}^T$ leads to $$(\sigma^2\mathbf{I}+\mathbf{W}\mathbf{W}^T)^{-1}=\frac{1}{\sigma^2}\mathbf{I}-\frac{1}{\sigma^2}\mathbf{I}(\mathbf{W}(\mathbf{I}+\frac{1}{\sigma^2}(\mathbf{W}^T\mathbf{W}))^{-1}\mathbf{W}^T)\frac{1}{\sigma^2}\mathbf{I}\\=\frac{1}{\sigma^2}(\mathbf{I}-\mathbf{W}(\mathbf{W}^T\mathbf{W}+\sigma^2\mathbf{I})^{-1}\mathbf{W}^T)\\=\mathbf{C_x}^{-1}$$ resulting in $$\mathbf{C_x}=\sigma^2\mathbf{I}+\mathbf{W}\mathbf{W}^T=\mathbf{W}\mathbf{W}^T+\sigma^2\mathbf{I}$$ which is what we set out to show.

Summarising, the marginal density $p(\mathbf{x})$ is Normal with mean $\boldsymbol{\mu}$ and variance matrix $\mathbf{W}\mathbf{W}^T+\sigma^2\mathbf{I}$.