Why $X^TX$ be the identity matrix is equvalent with saying there is no correlation between features?

1k Views Asked by At

This was mentioned in the lecture as one of the assumptions for the problem of finding a hard/soft thresholding estimator. There was another time that my professor mentioned this assumption, so I think it is important. But I don't know why - why the design matrix $X$ being orthonormal is equivalent with saying there is no correlation between features? Name $x_1,...,x_p$ the column of my design matrix $X$, then orthonormal means that for any $x_i, x_j$, their inner product is 0; on the other hand, no correlation is equivalent as saying the matrix cov($x_i,x_j$)=0. But the first one is purely linear algebra result, whereas the second one involves with probability of $x_i,x_j$.

2

There are 2 best solutions below

0
On BEST ANSWER

The statement is kind of true, with two caveats: 1. You need to center the columns of $X$, and the magnitude of the diagonal elements doesn't matter.

So, the statement should be: for a design matrix $X$ with centered columns, the features are uncorrelated, if and only if $X^TX$ is a diagonal matrix.


By the design matrix $X$, usually one means an $n\times p$ matrix, where the columns are the features and the rows are the different observations. Consider any two features $x_j,x_k$ with $j\neq k$. The empirical covariance is defined as: $$ Cov(x_j,x_k):=\frac{1}{n}\sum_{i=1}^n(x_{ij}-\bar{x}_j)(x_{ik}-\bar{x}_k) $$ Here, $\bar{x}_j:=n^{-1}\sum_{i=1}^nx_{ij}$. It is easy to show that: $$ =\frac{1}{n}\sum_{i=1}^nx_{ij}x_{ik}-\bar{x}_j\bar{x}_k=\frac{1}{n}(X^TX)_{jk}-\bar{x}_j\bar{x}_k $$ Now, if you knew that $X^TX$ is a diagonal matrix, it would follow that $(X^TX)_{jk}=0$. In particular $$ =-\bar{x}_j\bar{x}_k $$ So, if $\bar{x}_j=0$ or $\bar{x}_k=0$, in particular after centering the columns of $X$, $X^TX$ being a diagonal matrix implies that all $x_j$ and $x_k$ are uncorrelated.

But $X^TX=I$ itself is neither sufficient nor necessary for uncorrelated features.

It is not sufficient: Let $X=I$, say with $p=n=2$. Then $X^TX=I$, but you can check that the empirical covariance matrix is $Cov(x_1,x_2)=1/4$. This exemplifies that we need mean zero in for the columns for the argument to work.

It is not necessary: Again let $p=n=2$, and this time define $X$ with columns $x_1:=(1,-1)/2$ and $x_2:=(-1,1)/2$. A simple calculation shows that $X^TX=2I$, and $Cov(x_1,x_2)=0$. This exemplifies that we only need $X^TX$ to be diagonal in the argument above, the magnitude of the diagonal elements themselves is irrelevant.

2
On

$X^TX=XX^T=I$ is one definition of an orthogonal matrix. This means that each column (and equivalently, each row) is orthogonal to other columns (or rows).

As a consquence, there is no correlation between the columns (or rows) because the inner product between distinct columns (or rows) will always be $0$.