I'm a final year maths undergrad doing a course in multivariate data analysis, but I'm really struggling with the linear algebra. In particular the “projection of the data along the 1st k principal components” is mentioned in the notes as

but the projection of a vector along a vector is defined as

To me, given the projection of one vector along another vector is itself a vector, the “projection of the data along the 1st k principal components”, could not be stored in matrix alone, but would have to be in an k by (n by p or p by n) array. The matrix $XV_k$ does contain the scaling factors for every single data point along the 1st k principal components, but it’s got nothing to do with directionality. Can anyone explain what I'm not getting here? Cheers, Douglas
I think you get it right, the confusion comes from semantics.
I think the word "projection" used in the PCA / machine learning field is subtly different from the 2nd projection you used in geometry / linear algebra (projecting 3d points to 2d).
In PCA language, the projection usually just referred to the "projection coefficient" or signed length of projection along the vector $v_k$. You can think of it like a "coordinate" value on the basis vector $v_k$.
To compare them,