I'm not sure if I asked this question correctly. But it occurred to me that in PCA (aka SVD), we treat the data matrix as if it is a linear transformation by talking about its 'Eigen vectors/values' but Eigen vectors and values are implicitly supposed to define the behavior of a matrix as linear transformation, since it describes what happens when you transform special vectors with the matrix.
So could you explain to me how/if it is indeed valid to think of an arbitrary data matrix (e.g. an arbitrary data frame) as a linear transformation? Specifically when you have no intention of using it that way, and just want to do PCA on it...
As commented by Mason: Up to a factor $\frac{1}{n-1}\,$ the square matrix $C=X^\top X$ is the sample covariance matrix (provided the columns are centered). Since $C$ is symmetric it can be diagonalized by an orthogonal matrix $S$: $$ D=S^\top CS\,. $$ The columns of $S$ are the eigenvectors of $C$ and the diagonal elements are its eigenvalues. Obviously, $$ S DS^\top=X^\top X\,. $$ Lets write $\sqrt{D}={\rm diag}(\sqrt{d_1},\dots,\sqrt{d_n})\,.$ If $\boldsymbol{x}$ is a random vector whose elements are independent, have mean zero and variance one then the covariance matrix of $$ \boldsymbol{y}=\sqrt{D}S^\top\boldsymbol{x} $$ is easily seen to be $C=X^\top X\,.$ In other words:
The whole idea of PCA is to find a variable transformation that transforms the independent RVs $\boldsymbol{x}$ into linear combinations $\boldsymbol{y}$ that have the same covariance matrix as the original data.
In that context, the covariance matrix $C$ is not interesting as a linear map. In PCA it is more interesting to ask how many elements of $\boldsymbol{x}$ are needed to explain most of the variance of the data.
You should probably work on a realistic numerical example to get your head around it.