Let $X\in \mathbb R^d,Y\in \mathbb R$ be random variables. The expected prediction error of a predictor $f$ is
$$ \mathbb E[(f(X) - Y)]^2. $$
Suppose the true distribution of data satisfies
- There is $\beta\in\mathbb R^d, \sigma^2\in\mathbb R_{>0}$ such that $Y\sim \mathcal N(\beta^TX, \sigma^2)$.
- $\mathbb E[X] = 0$.
Suppose we draw a sample of $N$ points from this distribution, and let $f$ be the predictor formed by linear regression.
Equation 2.28 of Elements of Statistical Learning computes the expected prediction error of $f$ as approximately being
$$ EPE(f)\approx \sigma^2(d/N) + \sigma^2. $$
Is there a way to make this more formal?
It is only approximate because it uses $\mathbf{X}^T\mathbf{X}/N\to Cov(X)$ as $N\to\infty$, where $\mathbf{X}$ is the $N\times d$ data matrix to conclude
$$ \mathbb E [x_0^T (\mathbf{X}^T\mathbf{X})^{-1} x_0] \approx \mathbb E [x_0^T Cov(X)^{-1} x_0]/N, $$
where $x_0$ is another observation of $X$ independent of $\mathbf{X}$.