I am trying to better understand Principal Component Analysis. It is my understanding that our covariance matrix $S\in M_n(R)$ is Symmetric therefore Normal. Due to its normality this implies that it is unitarily Diagonalizable that is
$$S=V^{'}DV$$
Where $V$ is a Unitary Matrix of Eigenvectors and $D$ is a diagonal matrix with the corresponding eigenvalues. Eigenvectors point in the direction of the linear transformation of a matrix, but what is the linear transformation imposed by S. Do I just consider the covariance matrix as the transformation of the standard basis into the columns of S or as a change of coordinates from the basis formed by the columns of S to the standard coordinates $\{E_1, E_2, ...., E_n\}$? How do I interpret the columns of S? My intuition tells me that the linear transformation will take vectors in $R^n$ towards the axis that contains the most variability. The matrix
$$M=\begin{bmatrix}3&1\\ 1&10\end{bmatrix}$$
will transform vectors towards the $y$ axis.
Secondly, It is my understanding since the trace of our matrix $S$ is the sum of the variances of our data and the sum of our eigenvalues. If we choose the eigenvalues that capture the majority of the trace of the matrix $S$ then we can capture the majority of the variability in our data. If I Consider a matrix $S\in M_3$ and the first 2 vectors explain the most variability then does this imply that the majority of the spread of my data is found in the first two axis (x and y) and on the third axis (z) the data is highly concentrated around its mean. That is my data can be reduced to two dimensions without much loss in the shape of my data. If my intuition above, about the interpretation of the linear transformation of $S$, is correct then the eigenvectors will point towards the axis that have the most variability. So projection onto the first two eigenvectors will be in the directions that capture most of the variability since the eigenvalues can be seen to represent the volume of the variability.
If I only consider the first two eigenvectors $V=[V_1,V_2]$ then the span of the columns will be a subspace of $R^3$. Then with the projection Matrix $V(V^{'}V)^{-1}V^{'}$ I can project my data onto a plane(the column space of V). Note because of orthogonality the projection could have been done one eigenvector at a time. If I am wrong please correct me. Therefore we project onto the eigenvectors since they point in the direction that explains most of our variability.
You might find the following Python simulation of correlated data helpful to get some insight into the relationship of the covariance matrix and its eigenvectors. It is based on a $ 2 \times 2$ covariance matrix, but should help in working through your arguments. The image below was generated using the simulation code. I hope this helps.