Rotation in Probabilistic PCA

188 Views Asked by At

Could someone shed some light onto understanding why there is an arbitrary rotation in PPCA. Compared to conventional PCA, PPCA using EM appears to land with different set of basis vectors everytime they are estimated. This basis matrix when used for projecting data onto latent space, they show some rotation.

Is there a way such rotation can be controlled ?

enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

Bishop's textbook Pattern Recognition and Machine Learning provides a thorough introduction to probabilistic PCA (PPCA); see §12.2. In the following, I will use his notation and I will reference the equations in the book.

The generative model of PPCA is defined as follows (eq. 12.33): $$\mathbf{x} = \mathbf{W} \mathbf{z} + \mathbf{\mu} + \mathbf{\epsilon}$$ where

  • $\mathbf{x}$ is the observed $D$-dimensional random variable
  • $\mathbf{z}$ is a latent $M$-dimensional random variable with standard normal distribution, $p(\mathbf{z}) = \mathcal{N}(\mathbf{z}|\mathbf{0}, \mathbf{I})$
  • $\epsilon$ is a noise vector drawn from a zero-mean $D$-dimensional normal distribution with covariance $\sigma^2 \mathbf{I}$
  • $\mathbf{W}$, $\sigma$ and $\mu$ are parameters which are estimated by maximizing the log-likelihood of the data.

The rotation invariance of the solution ($\mathbf{W}$ in our case) is caused by the fact that the distribution $p(\mathbf{x})$ depends on $\mathbf{W}\mathbf{W}^\intercal$. Any rotation of $\mathbf{W}$, that is $\tilde{\mathbf{W}} \leftarrow \mathbf{W} \mathbf{R}$ with $\mathbf{R}$ an orthogonal matrix, gives the same log likelihood, $\log p(\mathbf{x})$, since $\tilde{\mathbf{W}} \tilde{\mathbf{W}}^\intercal = \mathbf{W} \mathbf{R} \mathbf{R}^\intercal \mathbf{W}^\intercal = \mathbf{W} \mathbf{W}^\intercal$ (see eq. 12.39). Bishop notes:

[I]f the maximum likelihood solution is found by numerical optimization of the likelihood function, for instance using an algorithm such as conjugate gradients or through the EM algorithm, then the resulting value of $\mathbf{R}$ is essentially arbitrary.

However, as you have shown with your plots, the projection of observed data $\mathbf{x}$ to the latent space $\mathbf{z}$ is also up to a rotation. This fact can be shown by plugging into the posterior mean formula $\mathbb{E}\left[\mathbf{z}|\mathbf{x}\right]$ (eqs. 12.48 and 12.50) the maximum likelihood solution $\mathbf{W}_{\mathrm{ML}}$ (eq. 12.45) and letting the variance $\sigma^2 \to 0$ (to make the model similar to the standard PCA). We obtain that for PPCA a point $\mathbf{x}$ is projected to $$\mathbf{R}^\intercal \mathbf{L}_M^{-1/2} \mathbf{U}_M^\intercal (\mathbf{x} - \mathbf{\mu})$$ as opposed to the the standard PCA algorithm, which projects $\mathbf{x}$ to $$\mathbf{L}_M^{-1/2} \mathbf{U}_M^\intercal (\mathbf{x} - \mathbf{\mu})$$ (as indicated by eq. 12.24).

You ask:

Is there a way such rotation can be controlled?

Indeed, Tipping and Bishop observe that we can recover $\mathbf{R}^\intercal$ as the matrix whose columns are eigenvectors of the matrix $\mathbf{W}_{\mathrm{ML}}^\intercal \mathbf{W}_{\mathrm{ML}}$ (see eq. 32 in their paper). So to obtain the projection to the true principal axes you will need to multiply the projected points with $\mathbf{R}$ to undo the rotation.