Principal components analysis (PCA) is often described as finding "linear combinations of the original variables which maximize variance". See for example the discussion here. I am trying to understand how this relates to the SVD. Consider the SVD of a $M \times N$ matrix $A$, $$ A = U S V^T $$ The principal components are the columns of the matrix $U$, so the idea that they are "linear combinations of the original variables" would imply that the columns of $U$ are linear combinations of the columns of $A$. Can anyone help me to see how this is true?
If we just rearrange this, $U = A V S^{-1}$, it seems to me that $U$ has much more complicated structure than just being linear combinations of the columns of $A$.
First of all, note that $S^{-1}$ is not usually defined for an $M \times N$ matrix $S$. So, I will insist on a compact SVD, so that the sizes of $U,S,V$ are $M \times r$, $r \times r,$ $N \times r$, and $S$ is necessarily invertible (in addition to being square). In any case, this often (but not universally) what is done in the relevant numerical linear algebra literature.
With that established, we indeed have $$ U = A VS^{-1}, $$ so that $U = AM$ for some matrix $M$. This is, on its own, enough to deduce that the columns of $U$ are linear combinations of the columns of $A$. Indeed, if we use $M_j$ to denote the $j$th column of $M$, then we have $$ U_j = [AM]_j = A (M_j) = \sum_{i=1}^M m_{ij}A_i. $$