What is the correct formula for the covariance matrix?

95 Views Asked by At

I am solving about principal component analysis (PCA) and I stumbled upon a place where I need to calculate the covariance matrix, I am seeing varieties of formula. Here are some that I have found:


Theses are 2 formulas


The $n$ or $n-1$ is confusing me. What is the correct formula?

This source solves using $n$, whereas this video solves using $n - 1$ in the denominator.

1

There are 1 best solutions below

0
On

There is no correct or incorrect here.

The difference, $n-1$, is called the Bessel's correction

It corrects the bias in the estimation we do about the variance (we do not know the true population variance).

  • In most statistical textbooks, they use this correction when explaining covariance (especially when the focus is on applying the formula instead of understanding the underlying theory).
  • Most packages (e.g. Matlab, NumPy for Python) use this correction in their covariance function.

So I don't really see a place in statistics for the biased formula. When applying covariance, and thus also when doing PCA, I would go for the $n-1$ variant.

But to stress again: it's not a question of correct or incorrect. In probability theory, when calculating the variance of a discrete random variable, you don't want to apply the correction. So be careful to always go for on or the other whenever you are dealing with variance.