I don't understand the intuition behind zero-mean in a covariance matrix. A resource states the following:
https://datascienceplus.com/understanding-the-covariance-matrix/
- Where does the zero mean come from?
- Which data and vectors get transformed here? If we are looking to height and weight as x and y values, f.e., does this get translated into the covariance matrix? or is the covariance matrix a set of values that helps us rotate the actual data vectors to a direction with the most variances? I just don't understand how we derive the covariance matrix, why we have the covariance matrix and on which data set the covariance matrix gets applied in PCA.
The zero mean formula comes from by setting the mean $\bar X=0$ and by defining $$X=\begin{bmatrix}X_1 & \ldots & X_n\end{bmatrix}\in\mathbb{R}^{d\times n}.$$
Note that the dimensions of $X$ are incorrect in the document you posted.
If we expand $XX^T$, we get that
$$XX^T=\begin{bmatrix}X_1 & \ldots & X_n\end{bmatrix}\begin{bmatrix}X_1 & \ldots & X_n\end{bmatrix}^T=\sum_{i=1}^nX_iX_i^T,$$
from which we recover the zero-mean covariance matrix.
Your second point is unclear as it contains a lot of unclear comments. The covariance matrix measures the variability in your data. For instance, if the first component of the $X_i$'s contains height, the variance will indicate how variable the height is your dataset. If the variance is high, then you have high variability and if it is low, then you will have a low variability.
Now if the second component is weight, then the covariance between height and weight will tell you how they correlate. If the covariance is positive this means that the correlation is positive, which means that large heights are associated with large weights, and vice-versa. If the covariance is negative this means that the correlation is negative, which means that large heights are associated with low weights, and vice-versa.
If you want to do dimensionality reduction, the covariance matrix can be decomposed into the sum of rank-one matrices obtained from an eigenvalue decomposition. Each one of those matrices "explain" part of the data at the covariance level. Those rank-one matrices can be ranked in terms of importance by ordering the associated eigenvalues and keeping the ones associated with the largest ones.