I am following this article on face recognition. In "calculating eigenfaces" section, the authors present a solution for the problem of calculating a very big matrix:
Let $A_{N^2\times M}$ be an $M$ sized dataset, where each column in an $N\times N$ image. Instead of calculating the $M$ large eigenvectors of the $N^2\times N^2$ co-variance matrix they calculate the $M$ eigenvectors of $L=A^TA$ matrix which is of size $M\times M$.
- Why is this a valid\good enough solution?
- What are the criteria for a largest vector? larger in which seance?
If $A$ and $B^T$ are $m\times n$ matrices (so both products $AB$ and $BA$ exist) then their characteristic polynoms $p_{AB}(\lambda)$ and $p_{BA}(\lambda)$ are almost the same (look for the sketch of proof at Wikipedia): $$ p_{AB}(\lambda)=(-\lambda)^{m-n}p_{BA}(\lambda). $$ Therefore, their non-zero eigenvalues are the same as well.
Also, for any eigenvector $x$ of matrix $AB$ corresponding to a non-zero eigenvalue $\lambda$ the vector $Bx$ becomes an eigenvector of $BA$ and corresponds to the same eigenvalue. $$ (BA)(Bx)=B(AB)x=B(\lambda x)=\lambda(Bx). $$