Why do we scale by $\frac{1}{N-1}$ while calculating the covariance matrix in PCA?

457 Views Asked by At

When we perform the Principal Components Analysis (PCA) on a set of N d-dimensional vectors, we scale by a factor of $\frac{1}{N-1}$.
Here's what we do in PCA:

  1. We calculate the mean of all the d-dimensional vectors
  2. We subtract each vector by its mean
  3. We calculate the covariance matrix:
    $C = \frac{1}{N-1}\sum_{i=1}^{N}(x_i-\bar{x})\cdot(x_i-\bar{x})^T$

and so on.
In the third step, why do we multiply by a factor of $N-1$?

1

There are 1 best solutions below

0
On

It is exactly the same reason why the standard estimator for the variance divides by $N-1$ instead of $N$. If you divide by $N$ instead, the estimator you get is biased, as you can see by directly calculating its expected value.

There are various intuitive ways to explain this. One is that by replacing $\mu$ with $\overline{x}$, you "lose a degree of freedom", since the difference between the values and the sample means are not quite independent. This perspective is useful in a number of similar contexts where you have to subtract something from the sample size to get the appropriate estimator.

Another is to think about the extreme case $N=1$: in this case, if you divide by $N$, you will always get zero, which is clearly not a good estimate for the population variance in general.