Why is it important for a correlation matrix to be positive semidefinite?

1.8k Views Asked by At

Now I understand the definition of positive semidefiniteness but I am struggling to understand as to why a Correlation matrix must be positive semidefinite. What are the effects of negative eigenvalues in relation to correlation matrices?

2

There are 2 best solutions below

2
On BEST ANSWER

Note that I've written this answer in terms of covariance matrices, but a correlation matrix is simply a scaled covariance matrix where

$\rho_{i,j}=\frac{C_{i,j}}{\sqrt{C_{i,i}} \sqrt{C_{j,j}}}$.

It's relatively easy to show that $C$ is positive semidefinite if and only if $\rho$ is positive semidefinite.

This answer is written in two parts because the term "covariance matrix" is sometimes used for the sample covariance matrix of a data set and other times for the theoretical covariance matrix of jointly distributed random variables.

First, the sample covariance.

Given a sequence of data vectors $x^{(1)}$, $x^{(2)}$, $\ldots$, $x^{(n)}$, Let $X$ be the matrix whose columns are $x^{(1)}$, $x^{(2)}$, $\ldots$, $x^{(n)}$.

$X=\left[ x^{(1)}, x^{(2)}, \ldots x^{(n)} \right] $.

The sample mean is

$\bar{x}=\frac{\sum_{i=1}^{n} x^{(i)}}{n}$.

It's convenient to define

$\bar{X}=\bar{x}\left[1, 1, \ldots, 1 \right]$

to get a matrix whose columns are each $\bar{x}$.

The sample covariance matrix is

$C=\frac{1}{n-1} \left( XX^{T} - \bar{X}\bar{X}^{T} \right)$.

To see that this matrix is positive semidefinite, take any vector $z$ in $R^{n}$. Then

$z^{T}Cz=\frac{1}{n-1} (z^{T}(X-\bar{X}))^{T}((X-\bar{X})z$.

$z^{T}Cz=\frac{1}{n-1} ((X-\bar{X})z)^{T}((X-\bar{X})z)=\frac{1}{n-1}\| (X-\bar{X})z \|_{2}^{2} \geq 0$.

Thus $C$ is positive semidefinite.

Note that, in practice, particularly if $n$ is smaller than the dimension of the vectors $x$, $C$ can be numerically singular and might have what appear to be small negative eigenvalues.

Next, the covariance matrix of a jointly distributed vector of random variables.

Suppose that the random variables $X_{1}$, $X_{2}$, $\ldots$, $X_{n}$ are jointly distributed. We'll assume that $E[X]=0$, but it's an easy exercise to extend this to a nonzero mean $\mu$. Then

$\mbox{Cov}(X)=C=E[XX^{T}]$.

To show that $C$ is positive semidefinite, take any vector $z$ in $R^{n}$, and consider the random variable

$W=z^{T}X=z_{1}X_{1}+z_{2}X_{2}+\ldots+z_{n}X_{n}$.

Since $E[X^{(i)}]=0$, $E[W]=0$, and

$\mbox{Var}(W)=E[W^{2}]$.

$\mbox{Var}(W)=E[(z^{T}X)^{2}]=E[z^{T}XX^{T}z]$.

$\mbox{Var}(W)=z^{T}Cz$.

But the variance of every random variable is greater than or equal to 0, so

$\mbox{Var}(W)=z^{T}Cz \geq 0$.

In practice, we usually try to avoid working with covariance matrices that are positive semidefinite but singular and prefer to work with covariance matrices that are actually positive definite.

6
On

The fact that a correlation matrix is positive-semidefinite (p.s.d.) is a property, not a desired attribute. Note that this is a theoretical fact, some algorithms may generate matrices with negative eigenvalues due to computational error and floating-point error.

Now let's take this discussion one step back so we can understand it better after. The correlation matrix is part of a decompositon of the covariance matrix as shown below $$ \Sigma = \operatorname{diag}(\sigma) C \operatorname{diag}(\sigma) $$ where $\operatorname{diag}(\sigma)$ is a diagonal matrix with the standard deviations as it's entries. Also notice that a p.s.d. matrix satisfy the following:

$$\forall x \in \mathbb{R}^{n}, \; x^\top\Sigma x \geq 0$$

A useful fact about p.s.d. matrices that comes from the above statement is that there diagonal elements $\sigma_{ii}$ are greater or equal to $0$. Just choose $x_j=0$ if $j \not = i$ and $x_j=1$ if $j = i$, this gives $\sigma_{ii}\geq 0$.

So, we know the correlation matrix is p.s.d.. What does that mean?

Notice that if $C$ isn't p.s.d. you may get negative variances in the diagonal as pointed out by user @leonbloy in the comments. Notice that $C$ p.s.d implies $\Sigma$ p.s.d becase for every non-singular matrix $P$, transformations of the form $P\Sigma P$ are p.s.d.

Other interesting property is that if $\Sigma$ is p.s.d. then $\Sigma^{-1}$ is p.s.d. . Which leads us to another problematic fact, we can't guarantee the following expression is positive neither convex $$ (y-\mu)^\top\Sigma^{-1}(y-\mu) $$ This is a commonly encountered expression in statistics known as mahalanobis distance and it's also encountered in the multivariate gaussian distribution. Let $X$ be a $MVN(\boldsymbol\mu, \boldsymbol\Sigma)$ then it's density is given by

$$f_{\mathbf X}(x_1,\ldots,x_k) = \frac{\exp\left(-\frac 1 2 ({\mathbf x}-{\boldsymbol\mu})^\mathrm{T}{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu})\right)}{\sqrt{(2\pi)^k|\boldsymbol\Sigma|}}$$

Notice that the best case would be positive-definite $C$ because p.s.d. lets one or more variables be constant (A $0$ in the diagonal means a degenerate dimension and thus it's better to work in $N-1$ dimensions) or a linear combination of others and the $MVN$ pdf also becomes undefined.