Distribution of $X(X^\top X)^{-1}X^\top$ when $X$ is a matrix of iid normals

50 Views Asked by At

I'm looking at a sparse linear regression model. For the problem I'm working on, I would like to show that there are many possible choices of the independent variables that are wrong but achieve low residual error. In particular, I have a setup of $Y=X\beta^*+W$, where $Y$ is $n\times 1$, $X$ is $n\times p$, with $X_{ij}$ distributed i.i.d. as $N(0,\sigma^2_x)$, and $W$ is i.i.d $N(0,\sigma^2_w)$. The OLS coefficient is $\beta_{OLS}=(X^\top X)^{-1}X^\top Y$. This gives a predicted $\hat{Y}=X(X^\top X)^{-1}X^\top Y$. I want to consider the case that the true $\beta^*$ is actually 0, and bound the correlation between $Y$ and $\hat{Y}$, i.e. to see if $$\frac{Y\hat{Y}}{\lVert Y \rVert^2}$$ could be close to 1 even though $X$ is independent of $Y$.

I would therefore like to determine the distribution of $X(X^\top X)^{-1}X^\top$ when $X$ is an $n\times p$ matrix of i.i.d. normals, i.e. $X_{ij}$ are i.i.d. $N(0, \sigma^2)$ for all $i,j$. I know that in this case $X^\top X$ is distributed as a Wishart $W_p(\sigma^2 I_p,n)$ but I haven't got any further than this.