Calculating Covariance from data

42 Views Asked by At

So wikipedia says that covariance is:

$$ E[(X_i - \mu_i)(X_j - \mu_j)^T] \text{ for all i,j in the covariance matrix} $$

Right now, I have my code set up so that I multiply (dot product) my $i$ data columns minus the average for that column by the $j$ data columns minus the average for that column...and I get a nice $d\times d$ matrix (where $d = \text{ data feature count (or # of columns)}$).

My question is: is that the covariance matrix? Can someone please confirm?

Or do I have to do something extra-- the $E[...]$ part of the wikipedia definition is throwing me off--does it mean that I need to do something more?

(also I got a covariance value of 116 at the $0,0$ location, with a mean of $13$ for that column vector...so I am not sure if that is correct...)

Update

In the comment below someone mentioned that I could estimate expectation by dividing the resulting $d \times d$ matrix by $d-1$. Why would one do that?

And also, numpy does not seem to implement the expectation step...does that mean it is correct to skip it on finite feature data generated from observation?