Trying to learn about kernel PCA but cannot understand some math.

54 Views Asked by At

I'm trying to learn about kernel PCA by reading through the paper of it's creators (I assume) "Nonlinear Component Analysis as a Kernel Eigenvalue Problem", Bernhard Schölkopf, Alexander Smola, Klaus-Robert Müller, Technical Report No 44, 1996

I don't understand the part where (page 3 of the above pdf) if you combine the equations (7) and (8) you get the (9), that is:
if $$\lambda (\Phi(x_k)\cdot \mathbf V)=(\Phi(x_k)\cdot \bar C\mathbf V)\; \text{for all $k=1,\ldots, M $}$$ and $$ \mathbf V = \sum_{i=1}^M a_i\Phi(x_i) $$ we get $$ \lambda \sum_{i=1}^M a_i(\Phi(x_k)\cdot \Phi(x_i)) = \frac 1 M \sum_{i=1}^M a_i(\Phi(x_k)\cdot \sum_{j=1}^M \Phi(x_j))(\Phi(x_j) \cdot \Phi(x_i)) $$

using the covariance matrix $\bar C$ in the feature space for our $M$ centered observations: $$ \bar C = \frac 1 M \sum_{j=1}^M \Phi(x_j) \Phi(x_j)^\mathsf T $$

What happened to the transposed $\Phi(x_j)^\mathsf T$ inside the sum in the covariance matrix?