How to choose W in Pearson's PCA derivation?

39 Views Asked by At

Given that $U\Lambda V^t$ is the SVD decomposition of Y $(Y=U\Lambda V^t)$, and we have to choose k singular values ($\sigma _i$) for the sake of dimensionality reduction, prove that:

$trace(Y^tWW^tY)=trace(V\Sigma U^tWW^tU\Sigma V^t)$

is maximized when the columns of W are colinear with the k columns of U, which are associated with the k largest singular values .