I'm doing a report about PCA and SOM. In a toolbox, its document said that after calculating eigenvectors and eigenvalues, it takes 2 first eigenvectors with the greatest eigenvalues then normalizes these eigenvectors to unit length and multiply them by corresponding square-root-of eigenvalues. I have some questions:
- Why we just take 2 largest eigenvectors? Is it just because they account for much percentage of data?
- Why we normalize eigenvectors to unit length and then multiply them by corresponding square-root-of eigenvalues?
Thank you!
A little more context would have been useful, but I will assume you are talking about the use of PCA in initializing SOMs (see here and here).
Yes, by definition the first two PCA eigenvectors will span the subspace accounting for the most variance in the data. You could use more, but ultimately it is a balance between eliminating noise and reducing dimensionality versus capturing more of the data's variance.
It's hard to say off-hand, but my suspicion is that it is a way of weighting the "more important" dimension more highly. For rooting the eigenvalue, recall that the eigenvalues in PCA represent variances, and so the square root of the eigenvalue represents the standard deviation. It is more natural to scale by standard deviation, since it is in the same units as the data (just as we scale a normal distribution by its standard deviation as well). So, by scaling by the rooted eigenvalues, it is like we are looking 1 standard deviation out, in each of the two PCA dimensions (and naturally give more space to the dimension with higher variance).