What does Determinant of Covariance Matrix give?

58.1k Views Asked by At

I am representing my 3d data using its sample covariance matrix. I want to know what the determinant of covariance Matrix represents. If the determinant is positive, zero, negative, high positive, high negative, what does it mean or represent?

Thanks

EDIT:

Covariance is being used to represent variance for 3d coordinates that I have. If my covariance matrix A determinant is +100, and the other covariance matrix B determinant is +5. Which of these values show if the variance is more or not. Which value tells that data points are more dispersed. Which value shows that readings are further away from mean.

4

There are 4 best solutions below

5
On

It cannot be negative, since the covariance matrix is positively (not necessary strictly) defined. So all it's eigenvalues are not negative and the determinant is product of these eigenvalues. It defines (square root of this) in certain sense the volume of n (3 in your case) dimensional $\sigma$-cube. It is analog $\sigma$ for 1 dimensional case.
Notice that mulitvarite normal distribution is defined as $$ f_{\mathbf x}(x_1,\ldots,x_k) = \frac{1}{\sqrt{(2\pi)^k|\boldsymbol\Sigma|}} \exp\left(-\frac{1}{2}({\mathbf x}-{\boldsymbol\mu})^T{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu}) \right), $$ Here $|\Sigma|$ is determinant of $\Sigma$.

2
On

I would like to point out that there is a connection between the determinant of the covariance matrix of (Gaussian distributed) data points and the differential entropy of the distribution.

To put it in other words: Let's say you have a (large) set of points from which you assume it is Gaussian distributed. If you compute the determinant of the sample covariance matrix then you measure (indirectly) the differential entropy of the distribution up to constant factors and a logarithm. See, e.g, Multivariate normal distribution.

The differential entropy of a Gaussian density is defined as:

$$H[p] = \frac{k}{2}(1 + \ln(2\pi)) + \frac{1}{2} \ln \vert \Sigma \vert\;,$$

where $k$ is the dimensionality of your space, i.e., in your case $k=3$.

$\Sigma$ is positive semi-definite, which means $\vert \Sigma \vert \geq 0$.

The larger $\vert \Sigma \vert$, the more are your data points dispersed. If $\vert \Sigma \vert = 0$, it means that your data ponts do not 'occupy the whole space', meaning that they lie, e.g., on a line or a plane within $\mathbb{R}^3$. Somewhere I have read, that $\vert \Sigma \vert$ is also called generalized variance. Alexander Vigodner is right, it captures the volume of your data cloud.

Since a sample covariance matrix is defined somewhat like: $$\Sigma = \frac{1}{N-1} \sum_{i=1}^N (\vec{x}_i - \vec{\mu})(\vec{x}_i - \vec{\mu})^T\; $$ it follows, that you do not capture any information about the mean. You can verify that easily by adding some large constant vectorial shift to your data; $\vert \Sigma \vert$ should not change.

I don't want to go to much into detail, but there is also a connection to PCA. Since the eigenvalues $\lambda_1, \lambda_2, \lambda_3$ of $\Sigma$ correspond to the variances along the principal component axis of your data points, $\vert \Sigma \vert$ captures their product, because by definition the determinant of a matrix is equal to the product of its eigenvalues.

Note that the largest eigenvalue corresponds to the maximal variance w.r.t. to your data (direction given by the corresponding eigenvector, see PCA).

1
On

It might help to break down the parts "determinant" and "covariance".

The determinant generally gives you the magnitude of a matrix transformation. You could think about it as how "big" it is.

The covariance matrix gives you how variables in the matrix vary with each other.

Thus the determinant of the covariance matrix gives you the measure of magnitude of how much the variables "vary" with eachother.

In the case of comparing a determinant of matrix A with 100 vs a determinant of matrix B which is 5, the smaller determinant would suggest that the data you are looking at in matrix b has variables that are more independent of each other compared to the variables in matrix A.

1
On

The determinant of the covariance matrix is referred to as generalized variance by Wilks in 1932. Comparing the density of the univariate and multivariate normal, it is easy to see that $|\Sigma|$ plays a similar role to $\sigma^2$.

This has several interpretations (see for example Anderson 2003, Section 7.5)

  • A geometric interpretation: it is proportional to the volume of the ellipsoid $\left\{u \in \mathcal{R}^{k} \mid(u-\mu)^{\prime} \Sigma^{-1}(u-\mu)=c^{2}\right\}$
  • An entropy interpretation, as discussed by @tmp

Relation to generalized correlation?

If $|\Sigma|$ is the generalized variance, is there also a generalized correlation? Defining $\sqrt{1-\frac{|\Sigma|}{\sigma_1^2\cdot\ldots\cdot\sigma^2_N}}$, this is called sometimes the collective correlation coefficient. You can verify that for N=2, this is indeed the usual correlation coefficent: $\sqrt{1-\frac{\sigma_1^2\sigma_2^2-\rho \sigma_1^2\sigma_2^2}{\sigma_1^2\sigma^2_2}}=\sqrt{1-(1-\rho^2)}=\rho$

References

Anderson, T. W., An introduction to multivariate statistical analysis., Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley (ISBN 0-471-36091-0/hbk). xx, 721 p. (2003). ZBL1039.62044.

Wilks, S. S., Certain generalizations in the analysis of variance., Biometrika 24, 471-494 (1932). ZBL58.1172.02.