Covariance matrix having lower rank then dimension of the r.v. implies observations lie on a lower dimensional hyperplane, but why?

46 Views Asked by At

Let $x$ be a $p$-dimensional random vector, and suppose we have a set of $n$ observations and we put them in $X \in \mathbb{R}^{n \times p}$, the data matrix. For simplicity assume $n > p$ and that the data have already been centered. We can then theoretically have full rank covariance matrix $$S = \frac{1}{n-1}X^TX=\frac{1}{n-1}\sum_{i}x_ix_i^T.$$

Now we know that $S$ is at least semi-SPD. Supposing it is indeed only semi-SPD, apparently this implies that our observations actually lie on a lower dimensional hyperplane. Seeing this in one direction is simple, suppose $y_i:=Ax_i$, where $A \in \mathbb{R}^{q \times p}$ and $q>p$. That is, we apply a tall matrix to $x$ so that it looks like it lives in a higher dimension. Let $S_y$ denote the covariance matrix of $y$, thus $S_y = ASA^T \in \mathbb{R}^{q \times q}$, but rank is at most $p$. This shows that 'up-dimensioning' your data will result in a singular covariance matrix.

On the other hand, how does a singular covariance matrix implies that the data lives in a lower dimensional hyperplane? Decompose $S = ODO^T$, and suppose $\text{rank}(s) + k = p$. Then, the lower right block of $D$, of dimension $k \times k$ can be written such that it's a zero matrix, and we can throw away the last $k$ columns of $O$, call this matrix $O_{p-k}$. Now, since $O_{p-k} \in \mathbb{R}^{p \times (p-k)}$, we can 'rref' it: there exists an elementary matrix (or rather product of elementary matrices) $M$ such that $MO_{p-k}$ have zeroes in the bottom $k\times k$.

To sum up, so far we have shown that:$$S = ODO^T = O_{p-k}DO_{p-k}^T.$$ Now, since $M$ is an elementary matrix, applying $M$ into our observations $x_i$ does not drop dimensions, but the covariance matrix of $Mx$ is equal to $S_M = MOD(MO)^T = MO_{p-k}D(MO_{p-k})^T$. I'm pretty sure the way I wrote this will result in $[S_M]_{i,j}=0$ for every $i > (p-k)$ and $j > (p-k)$. Thus just by applying a full rank linear transformation, we can show that the data has zero variance in $k$ axes: each observation have the same value for these $k$ axes and thus the data really only lives in $p-k$ dimensions.

I guess I don't really have a question other than asking if my reasoning is valid. Is there a simpler way of seeing the second direction though?

1

There are 1 best solutions below

0
On BEST ANSWER

Suppose that $\DeclareMathOperator{\r}{rank}\r(S) < p$, which is equivalent to saying $\r(X^TX)<p$. By the rank-nullity theorem, there must exist a vector $a=[a_1\;\cdots \;a_p]^T$ in the kernel of $X^TX$. This means that $(X^TX)a=0$, which further implies that $a^TX^TXa=0$, which is equivalent to $$ (Xa)^T(Xa)=0. $$ For all real vectors $z$, we have $z^Tz=0$ implies $z=0$, because $z^Tz=z_1^2+\dots+z_p^2$. Therefore, from the above, we conclude that $Xa$ is the zero vector. But for each $i\in \{1,\dots,n\}$, the $i^\text{th}$ component of $Xa$ is simply $ x_i\cdot a, $ so we have shown that all of the data points lie on the hyperplane defined by $a\cdot x=0$.