I am trying to answer the question phrased as follows
Give a detailed explanation of Principal Component Analysis. Your explanation should include explanations of the terms: geometric information; covariance matrix; orthogonal transformation; Spectral Theorem and describe how the technique can be used to reduce dimensionality while retaining much geometric information
My understanding of Principal Component Analysis is that it reduces a number of variables x1, x2... to a smaller set of principal components that store as much of the original information from the original variables in these newly created principal components.
For example if one were to reduce two attributes of a car, say speed and engine size into one principal component. These original components would be plotted on an xy plane and then brought together into a new line of best fit, putting these points through an orthogonal transformation that preserves the points original distance from one another.
The covariance matrix measures how variations in pairs of variables are linked to each other and its diagonal values are always equal to 0. So in this example it would store the variance of the cars speeds and engine size.
The covariance matrix is then used to calculate the relevant set of eigenvalues and eigenvectors.
Dimensionality can be reduced by then selecting the k largest eigenvectors as the new k principal components which represent as much of the variance as possible with as few variables. The more the dimensionality is reduced (i.e. the more principal components that are removed) the less the variance of the original variables (or geometric information) is captured in the final result.
My two questions are
- How does the spectral theorem relate to PCA.
- Have I provided a detailed enough explanation of what PCA does otherwise.
Any help would be greataly appreciated!
Answers, in order.
I recommend that you review (or look up) what exactly the spectral theorem says about real, symmetric matrices. The role of the spectral theorem is embedded in your statement here:
Recall that not every matrix has a full set of eigenvectors. The spectral theorem, however, ensures that our (symmetric) covariance matrix not only has a complete set of eigenvectors, but also that those eigenvectors can be taken to be orthonormal, so that the principal components (conveniently) form an orthonormal basis for the relevant subspaces.
Regarding your second question, "enough" is a matter of taste. However, here are my two cents.
First of all, you never explain the term "geometric information." Going into detail:
It is not clear what you mean by the bolded sentence. Note also that there is a difference between what is typically referred to as the "line of best fit" and the line corresponding to the first principal component. Do you know what this difference is?
The diagonal values of the covariance matrix give the variance of a given variable, which is not generally zero.
This is strangely phrased, and it leaves the hanging question of "what matrix does this relevant set of eigenvalues and eigenvectors belong to"? Instead, I would say that "the eigenvalues and eigenvectors of the covariance matrix are then calculated".
What exactly do you do with these eigenvectors that you selected? The eigenvectors in question are unit vectors; what exactly is it that we are supposed to do with these vectors to extract the information from our data set?