I'm reading a paper about fisher faces, that using kernel PCA together with LDA.
The probelm I have is to understand $\Phi(x)$, I know that is an arbitary function for the vector $x$. But I don't understand why the report says at 2 OUTLINE OF KPCA AND KFD
As a result, a pattern in the original input space $IR^n$ is mapped into a potentially much higher dimensional feature vector in the feature space H.
So that means if I take a vector $\Phi(x)$ and the vector $x$ has the dimension $R^n$, then it will get a much higher dimension than $n$....well OK.
But then it says
Given a set of $M$ training samples $x_1$, $x_2$, ... , $x_M$ in $IR^n$
So that means if I collect all $x_i, i = 1,2,3, \dots, M$, I will get a matrix with the size $n * m$
The intresting part is
To obtain the expansion coefficients, let us denote $Q = [\Phi(x_1), \dots, \Phi(x_M)]$ and form an $M * M$ Gram matrix $R = Q^TQ$, whose elements can be determined by virtue of kernel tricks:
The goal is to compute the eigenvalaues $\gamma_1, \gamma_2, \dots, \gamma_m$ of $R$ and it will be the same as computing the eigenvalues $\beta_1, \beta_2, \dots, \beta_m$ of $S^{\Phi}_t$. And the relationship is the following formula.
$$\beta_j = \frac{1}{\sqrt(\lambda_j)}Q\gamma_j$$
With eigenvalues of $S^{\Phi}_t$, then you can project $x \in IR^n$ on another dimension $m$
$$y = P^T\Phi(x),\space P = (\beta_1, \beta_2, \dots, \beta_m)$$
The problem
If you notice, vector $x$ has the dimension $n$ and $\Phi(x)$ will mapp $x$ into a higher dimension, that means $m > n$. But in reality, $m$ is less than $n$ because $x$ is a vector, and $m$ is the total amount of samples
Given a set of $M$ training samples $x_1$, $x_2$, ... , $x_M$ in $IR^n$
In this case, one sample $x_i$ will be a column vector of an image. They tend to be very long.
Anyway. If $n > m$, then this equation will fail to compute
$$y = P^T\Phi(x),\space P = (\beta_1, \beta_2, \dots, \beta_m)$$
Because $\Phi$ will map $x$ into a higher dimension, but $n$ is already larger than $m$. That means $\Phi(x)$ must reduce the dimension of $x$? Well, that sounds not correct either.
Proposed solution:
So the solution must be to change the gram matrix. From:
$$R = Q^TQ$$
To:
$$R = QQ^T$$
Because, now we are getting R matrix of size $n * n$ matrix instead of $m * m$ matrix. Now, $\Phi$ does not even need to map $x$ into a higher or lower dimension.
Question:
What should I do? What is correct? I think this paper does not explain the sizes correctly. Are they really assuming that samples should be larger than rows of $x_i$ vectors?