For the simple multivariate linear regression with Gaussian noise: $\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}$, where
- $\mathbf{Y} \in \mathbb{R}^n$: the vector of dependent variables,
- $\mathbf{X} \in \mathbb{R}^{n \times p}$: each row is a vector of covariates,
- $\boldsymbol{\epsilon} \in \mathbb{R}^n$: Gaussian noise $\boldsymbol{\epsilon} \sim \mathcal{N}\big(0, \sigma^2 I_n\big)$ for some constant $\sigma > 0$,
the MLE estimator of $\boldsymbol{\beta}$ is simply the least square estimator which is $\hat{\boldsymbol{\beta}} = \big(\mathbf{X}^{T} \mathbf{X} \big)^{-1} \mathbf{X}^{T} \mathbf{Y}$.
It is easy to compute the quadratic risk of the estimator: $$\mathbb{E}\big[||\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}||_2^2\big] = \sigma^2 \mathrm{tr}\Big(\big(\mathbf{X}^{T} \mathbf{X} \big)^{-1}\Big).$$
My question: does this expression imply that the risk goes to zero as $n$ goes to infinity (i.e., we have more and more data)?
This requires $\lim_{n \to \infty} \mathrm{tr}\Big(\big(\mathbf{X}^{T} \mathbf{X} \big)^{-1}\Big) = 0$, which seems to be "trivial" when $p = 1$.
Note that \begin{align*} \mathrm{tr}\Big(\big(\mathbf{X}^{T} \mathbf{X} \big)^{-1}\Big) &= \frac{1}{n}\mathrm{tr}\Big(\big(\frac{1}{n}\mathbf{X}^{T} \mathbf{X} \big)^{-1}\Big). \end{align*} Now, in regression analysis, there is usually an assumption that guarantees that $$\frac{1}{n}\mathbf{X}^{T} \mathbf{X} \to \Sigma, \text{ as $n \to \infty$},\tag{1}$$ where $\Sigma \in \mathbb R^{p\times p}$ some symmetric positive definite matrix. If the $X_i$ are i.i.d. you get this from the law of large numbers. If the $X_i$ are deterministic then the assumption is often simply made as I stated it.
Hence assume that $(1)$ holds. Then $\mathrm{tr}\Big(\big(\frac{1}{n}\mathbf{X}^{T} \mathbf{X} \big)^{-1}\Big) \to \mathrm{tr}(\Sigma^{-1}), \text{ as $n\to \infty$}$ and therefore $$\frac{1}{n}\mathrm{tr}\Big(\big(\frac{1}{n}\mathbf{X}^{T} \mathbf{X} \big)^{-1}\Big) \to 0, \text{ as $n\to \infty$}.$$