Prediction error in least squares with a linear model

364 Views Asked by At

In the classical linear model with $$Y=X\beta +\epsilon,$$ where $Y \in \mathbb{R}^n$ is the observation, $X\in \mathbb{R}^{n\times p}$ is the known covariates, $\beta \in \mathbb{R}^p$ is the unknown parameter with, $p < n$, and $\epsilon \in \mathbb{R}^{n}$, $\epsilon \sim \mathcal{N}(0,\sigma^{2}I)$.

The classical least squares estimator here would be $$\hat{\beta}= (X^TX)^{-1}X^TY.$$

If $\beta^{0}$ is the true parameter, then we have that the prediction error is given by $E=||X(\hat{\beta}- \beta^{0})||_2^2$. We have that, $$E=||X(\hat{\beta}- \beta^{0})||_2^2/\sigma^2=||X(X^TX)^{-1}X^T \epsilon||_2^2/\sigma^2=||\gamma||_2^2/\sigma^2.$$

It is easy to see that $\gamma \sim \mathcal{N}(0,\sigma^2 X(X^TX)^{-1}X^T)$. However it is claimed in High Dimensional Statistics by Bühlmann and van de Geer (on page 101) that $E$ is distributed according to a chi-square distribution with $p$ degrees of freedom. I can not see how this is true (It would be true if $\gamma \sim \mathcal{N}(0,D)$ for some diagonal matrix $D$ with non-negative diagonal entries, but the fact that the co-ordinates of $\gamma$ are correlated make makes this false.) Am I missing something here?

1

There are 1 best solutions below

2
On BEST ANSWER

It might be helpful to know, that $X(X^TX)^{-1}X^T=I_n$, if $X^TX$ is invertible.