This was mentioned in the lecture as one of the assumptions for the problem of finding a hard/soft thresholding estimator. There was another time that my professor mentioned this assumption, so I think it is important. But I don't know why - why the design matrix $X$ being orthonormal is equivalent with saying there is no correlation between features? Name $x_1,...,x_p$ the column of my design matrix $X$, then orthonormal means that for any $x_i, x_j$, their inner product is 0; on the other hand, no correlation is equivalent as saying the matrix cov($x_i,x_j$)=0. But the first one is purely linear algebra result, whereas the second one involves with probability of $x_i,x_j$.
2026-04-01 02:54:07.1775012047
On
Why $X^TX$ be the identity matrix is equvalent with saying there is no correlation between features?
1k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
2
On
$X^TX=XX^T=I$ is one definition of an orthogonal matrix. This means that each column (and equivalently, each row) is orthogonal to other columns (or rows).
As a consquence, there is no correlation between the columns (or rows) because the inner product between distinct columns (or rows) will always be $0$.
The statement is kind of true, with two caveats: 1. You need to center the columns of $X$, and the magnitude of the diagonal elements doesn't matter.
So, the statement should be: for a design matrix $X$ with centered columns, the features are uncorrelated, if and only if $X^TX$ is a diagonal matrix.
By the design matrix $X$, usually one means an $n\times p$ matrix, where the columns are the features and the rows are the different observations. Consider any two features $x_j,x_k$ with $j\neq k$. The empirical covariance is defined as: $$ Cov(x_j,x_k):=\frac{1}{n}\sum_{i=1}^n(x_{ij}-\bar{x}_j)(x_{ik}-\bar{x}_k) $$ Here, $\bar{x}_j:=n^{-1}\sum_{i=1}^nx_{ij}$. It is easy to show that: $$ =\frac{1}{n}\sum_{i=1}^nx_{ij}x_{ik}-\bar{x}_j\bar{x}_k=\frac{1}{n}(X^TX)_{jk}-\bar{x}_j\bar{x}_k $$ Now, if you knew that $X^TX$ is a diagonal matrix, it would follow that $(X^TX)_{jk}=0$. In particular $$ =-\bar{x}_j\bar{x}_k $$ So, if $\bar{x}_j=0$ or $\bar{x}_k=0$, in particular after centering the columns of $X$, $X^TX$ being a diagonal matrix implies that all $x_j$ and $x_k$ are uncorrelated.
But $X^TX=I$ itself is neither sufficient nor necessary for uncorrelated features.
It is not sufficient: Let $X=I$, say with $p=n=2$. Then $X^TX=I$, but you can check that the empirical covariance matrix is $Cov(x_1,x_2)=1/4$. This exemplifies that we need mean zero in for the columns for the argument to work.
It is not necessary: Again let $p=n=2$, and this time define $X$ with columns $x_1:=(1,-1)/2$ and $x_2:=(-1,1)/2$. A simple calculation shows that $X^TX=2I$, and $Cov(x_1,x_2)=0$. This exemplifies that we only need $X^TX$ to be diagonal in the argument above, the magnitude of the diagonal elements themselves is irrelevant.