When we perform the Principal Components Analysis (PCA) on a set of N d-dimensional vectors, we scale by a factor of $\frac{1}{N-1}$.
Here's what we do in PCA:
- We calculate the mean of all the d-dimensional vectors
- We subtract each vector by its mean
- We calculate the covariance matrix:
$C = \frac{1}{N-1}\sum_{i=1}^{N}(x_i-\bar{x})\cdot(x_i-\bar{x})^T$
and so on.
In the third step, why do we multiply by a factor of $N-1$?
It is exactly the same reason why the standard estimator for the variance divides by $N-1$ instead of $N$. If you divide by $N$ instead, the estimator you get is biased, as you can see by directly calculating its expected value.
There are various intuitive ways to explain this. One is that by replacing $\mu$ with $\overline{x}$, you "lose a degree of freedom", since the difference between the values and the sample means are not quite independent. This perspective is useful in a number of similar contexts where you have to subtract something from the sample size to get the appropriate estimator.
Another is to think about the extreme case $N=1$: in this case, if you divide by $N$, you will always get zero, which is clearly not a good estimate for the population variance in general.