Should a gram matrix be $R = Q^TQ$ or $R = QQ^T$?

38 Views Asked by At

I'm reading a paper about fisher faces, that using kernel PCA together with LDA.

The probelm I have is to understand $\Phi(x)$, I know that is an arbitary function for the vector $x$. But I don't understand why the report says at 2 OUTLINE OF KPCA AND KFD

As a result, a pattern in the original input space $IR^n$ is mapped into a potentially much higher dimensional feature vector in the feature space H.

So that means if I take a vector $\Phi(x)$ and the vector $x$ has the dimension $R^n$, then it will get a much higher dimension than $n$....well OK.

But then it says

Given a set of $M$ training samples $x_1$, $x_2$, ... , $x_M$ in $IR^n$

So that means if I collect all $x_i, i = 1,2,3, \dots, M$, I will get a matrix with the size $n * m$

The intresting part is

To obtain the expansion coefficients, let us denote $Q = [\Phi(x_1), \dots, \Phi(x_M)]$ and form an $M * M$ Gram matrix $R = Q^TQ$, whose elements can be determined by virtue of kernel tricks:

The goal is to compute the eigenvalaues $\gamma_1, \gamma_2, \dots, \gamma_m$ of $R$ and it will be the same as computing the eigenvalues $\beta_1, \beta_2, \dots, \beta_m$ of $S^{\Phi}_t$. And the relationship is the following formula.

$$\beta_j = \frac{1}{\sqrt(\lambda_j)}Q\gamma_j$$

With eigenvalues of $S^{\Phi}_t$, then you can project $x \in IR^n$ on another dimension $m$

$$y = P^T\Phi(x),\space P = (\beta_1, \beta_2, \dots, \beta_m)$$

The problem

If you notice, vector $x$ has the dimension $n$ and $\Phi(x)$ will mapp $x$ into a higher dimension, that means $m > n$. But in reality, $m$ is less than $n$ because $x$ is a vector, and $m$ is the total amount of samples

Given a set of $M$ training samples $x_1$, $x_2$, ... , $x_M$ in $IR^n$

In this case, one sample $x_i$ will be a column vector of an image. They tend to be very long.

Anyway. If $n > m$, then this equation will fail to compute

$$y = P^T\Phi(x),\space P = (\beta_1, \beta_2, \dots, \beta_m)$$

Because $\Phi$ will map $x$ into a higher dimension, but $n$ is already larger than $m$. That means $\Phi(x)$ must reduce the dimension of $x$? Well, that sounds not correct either.

Proposed solution:

So the solution must be to change the gram matrix. From:

$$R = Q^TQ$$

To:

$$R = QQ^T$$

Because, now we are getting R matrix of size $n * n$ matrix instead of $m * m$ matrix. Now, $\Phi$ does not even need to map $x$ into a higher or lower dimension.

Question:

What should I do? What is correct? I think this paper does not explain the sizes correctly. Are they really assuming that samples should be larger than rows of $x_i$ vectors?