How to prove that the optimal encoding matrix D for PCA is given by the eiggen vector of $X^TX$ for the case when D has more than one line?

24 Views Asked by At

Suppose we have a collection of m points $\{x^{(1)}, ..., x^{(m)}\}$ in $R^n$ I apply a lossy compression for representing a lower-dimensional version of them. For each point $x^{(i)}\in R^n$ we find a corresponding code vector $c^{(i)}\in R^l$ if l is smaller than n, it stores the code points on less memory. I can do it with PCA. PCA is defined by the choice of the decoding function. Specifically we chose to use matrix multiplication to map code back into $R^n$. Let $g(c) = Dc$ where $D\in R^{n\times l}$ is the matrix defining the decoding. With PCA we are looking for this decoding Matrix $D$. That is to say :

$$D^* = \arg_d\min\sqrt{\sum(x^{(i)}_j-r(x^{(i)}_j)^2} $$

I can find D* for the case where $l=1$

Indeed, if we start by considering the case where l=1, D is a single vector, d. The problem reduces to

$$ d^*= \arg_d\min\sum ||x^{(i)}-dd^Tx^{(i)}||_2^2 \text{ subject to } \|d\|_2 =1 $$

If we rewrite the problem in terms of a single design matrix it enables to use a more compact notation. Let $X\in Rm\times$ be the matrix defined by stacking all the vectors describing the points, such that $X_{i,:}={x^{(i)}}^T$

We can now rewrite the problem as:

$$d*=\arg\min||X-Xdd^T||^2_F$$

Subject to $d^Td=1$

Then I will add the proof that the optimal d is given by the eiggen vector of $X^TX$.

It works for the case of l=1 and recovers only the first principal component. But what if we want a basis of principal components ? Is it possible to prove that the matrix $D$ would be given by the l eiggenvectors corresponding to the largest eiggenvalues ?