My understanding is that given matrix X, I can find its corresponding covariance matrix by:
- finding the means of each column
- subtracting each mean from each value in its respective column and
- multiplying the resulting matrix by its own transpose. Let's call this matrix C.
Here is what it would look like in Python:
Y = X - numpy.mean(X, axis = 0)
C = numpy.dot(Y, Y.T)
If I do this, I can prove mathematically (and experimentally using some simple Python code) that det(C) = 0 always.
However, a colleague tells me that using the inverse of a covariance matrix is common in his field and he showed me some R code to demonstrate.
> det(cov(swiss))
[1] 244394171542
I notice that R has several ways of calculating the covariance matrix that leads to different results. I also notice from Googling that some people say the covariance matrix is always singular (eg here) whereas others say it is not.
So, my question is: why the differences of opinion and what's the true answer?
EDIT: I discovered that the determinant is only zero if the matrix is square. If anybody knows the proof for this or can throw some further light on the matter, I'd be grateful.
Yes your statement is true. Let $X$ a matrix $n\times m$. If you do the algorithm that you wrote above (until the step $2$) you obtain a matrix $Z$ that have the sum of each column equal to $0$.
$$\forall j=1,\cdots,m \quad x_{1,j}-\sum_{i=1}^n\frac{x_{i,j}}{n}+\cdots+ x_{n,j}-\sum_{i=1}^n\frac{x_{i,j}}{n}=\sum_{i=1}^nx_{i,j}-n\sum_{i=1}^n\frac{x_{i,j}}{n}=0$$
Now if you do the product $Y=Z \cdot Z^T$ this statement continues to be true, and such that the matrix $Y$ is symmetric the statement is true for each row too. For example for the first column:
$$y_{1,1}=\sum_{i=1}^m z_{1,i}^2$$ $$y_{2,1}=\sum_{i=1}^mz_{2,i}z_{1,i}$$ $$\vdots$$ $$y_{n,1}=\sum_{i=1}^m z_{n,i}z_{1,i}$$ So $$\sum_{j=1}^n y_{j,1}=\sum_{i=1}^m \Big( z_{1,i}^2+z_{2,i}z_{1,i}+ \cdots + z_{n,i}z_{1,i}\Big)=\sum_{i=1}^m z_{1,i}\left( z_{1,i}+z_{2,i}+ \cdots + z_{n,i} \right)=0$$ Similarly for column=$2,3, \cdots, n$