What is the functional interpretation of the Eigen vectors in PCA?

55 Views Asked by At

I'm not sure if I asked this question correctly. But it occurred to me that in PCA (aka SVD), we treat the data matrix as if it is a linear transformation by talking about its 'Eigen vectors/values' but Eigen vectors and values are implicitly supposed to define the behavior of a matrix as linear transformation, since it describes what happens when you transform special vectors with the matrix.

So could you explain to me how/if it is indeed valid to think of an arbitrary data matrix (e.g. an arbitrary data frame) as a linear transformation? Specifically when you have no intention of using it that way, and just want to do PCA on it...

1

There are 1 best solutions below

2
On BEST ANSWER

As commented by Mason: Up to a factor $\frac{1}{n-1}\,$ the square matrix $C=X^\top X$ is the sample covariance matrix (provided the columns are centered). Since $C$ is symmetric it can be diagonalized by an orthogonal matrix $S$: $$ D=S^\top CS\,. $$ The columns of $S$ are the eigenvectors of $C$ and the diagonal elements are its eigenvalues. Obviously, $$ S DS^\top=X^\top X\,. $$ Lets write $\sqrt{D}={\rm diag}(\sqrt{d_1},\dots,\sqrt{d_n})\,.$ If $\boldsymbol{x}$ is a random vector whose elements are independent, have mean zero and variance one then the covariance matrix of $$ \boldsymbol{y}=\sqrt{D}S^\top\boldsymbol{x} $$ is easily seen to be $C=X^\top X\,.$ In other words:

  • the functional interpretation of the eigenvectors $S$ of $C$ is how you must linearly combine independent RVs such that they have the same covariance as your data $X$.

The whole idea of PCA is to find a variable transformation that transforms the independent RVs $\boldsymbol{x}$ into linear combinations $\boldsymbol{y}$ that have the same covariance matrix as the original data.

In that context, the covariance matrix $C$ is not interesting as a linear map. In PCA it is more interesting to ask how many elements of $\boldsymbol{x}$ are needed to explain most of the variance of the data.

You should probably work on a realistic numerical example to get your head around it.