Why does a covariance matrix have to be positive semi definite on an intuitive level

225 Views Asked by At

If I have a covariance matrix of some random vector $X$ with expectation $\mu \in \mathbb{R}^n$ it is not that difficult to show that its covariance matrix is positive semi definite; given $$ Cov(X) = \mathbb{E}[(X-\mu) (X-\mu)^T] $$ For any vector $z\in \mathbb{R}^n$ we have $$ z^T Cov(X) z = z^T \mathbb{E}[(X-\mu) (X-\mu)^T] z = \mathbb{E}[z^T(X-\mu) (X-\mu)^T z] $$ which is just the inner product squared $$ \mathbb{E}[z^T(X-\mu) (X-\mu)^T z] = \mathbb{E}[<z,(X-\mu)>^2] \geq 0 $$ and hence always greater than or equal $0$.

I don't really understand however why a non PSD cannot function as a covariance matrix on an intuitive level. Suppose you have a non-PSD matrix. Can you prove by contradiction that it can not be the covariance matrix of some random vector X?

2

There are 2 best solutions below

2
On BEST ANSWER

Basically, I use the same argument as you from a slightly different perspective. Suppose that $\Sigma=\operatorname E[(X-\operatorname EX)(X-\operatorname EX)']$ is a covariance matrix of a random vector $X$ such that $\operatorname E\|X\|^2<\infty$ (this is just to ensure that the covariance matrix is well-defined). Now suppose that $\Sigma$ is not positive semi-definite. This means that there exists some vector $a$ such that $$ a'\Sigma a<0. $$ It follows that \begin{align*} a'\Sigma a &=a'\operatorname E[(X-\operatorname EX)(X-\operatorname EX)']a\\ &=\operatorname E[a'(X-\operatorname EX)(X-\operatorname EX)'a]\\ &=\operatorname{Var}(a'X)\\ &<0. \end{align*} Observe that $$ a'X = a_1X_1+\ldots+a_dX_d. $$ This means that there exists some linear combination of the entries of $X$ such that the variance of this linear combination is negative which of course does not make sense.

I hope this is useful.

0
On

On a physics intuitive hand-waving level.

A covariance matrix is the same as, in physics, the angular inertia matrix: it is a second central moment
(cf. https://en.wikipedia.org/wiki/Moment_(mathematics)).

So if $\omega$ is the angular velocity (a vector) of a rigid system and $I_C$ its inertia matrix
(cf. https://en.wikipedia.org/wiki/Moment_of_inertia#Motion_in_space_of_a_rigid_body,_and_the_inertia_matrix),
the kinetic angular energy is $\frac 1 2 \omega I_C \omega$.
The eigenvectors are orthogonal, and are the basis of the inertia ellipsoid.

When the system is rotating, it necessarily (form a physical viewpoint) has more energy than when it is not: we have to provide energy to the system to set it in motion. So $\omega I_C \omega$ is necessarily $\ge 0$.
Moreover, the energy we provide is always $>0$, except if there is no mass around the rotation axis, i.e. the whole object is on the rotation axis.

The only drawback to this explanation, is that masses in physics are always $>0$, so we are not in the general mathematical case. But I still feel this analogy as interesting, especially with regards to distribution shape, principal component analysis, etc. If someone could provide a more precise answer that takes into account negative masses, I would be grateful.