I am trying to understand the objective function of PCA. Intuitively I understand that we are trying to find the direction where the variance of the projected data points on that direction is maximum.
The formulation of objective function is stated here: https://stats.stackexchange.com/a/10256/176418
But I don't understand the first equation itself. Which is as following:
$$\frac{1}{n}\sum_{i=1}^{n} x_i x_i^T = \frac{X^TX}{n}$$
where each $_$ is a vector of $p$ features and $$ is the matrix such that the th row is $x^T_i$ (data matrix).
How summation of variance of data points ($\sum_{i=1}^{n} x_i x_i^T$) is same as the covariance matrix of the entire data matrix ($X^TX/n$)?
Isn't this summation a scalar? Whereas the covariance matrix would be of size $p*p$, where $p$ is number of features as mentioned in the above reference?
The same thing is also mentioned in here: http://cs229.stanford.edu/notes/cs229-notes10.pdf (page 5), where this summation is called empirical covariance matrix.
I suppose, you've made a mistake: $\sum\limits_{i=1}^n x_{i} x_{i}^T = X^T X$ (without $\frac{1}{n}$ term) (below is python example)