How to derive the variance of the mean of predictions from a linear regression model?

Question

How to derive the variance of the mean of predictions from a linear regression model?

288 Views Asked by Bumbble Comm At 26 Mar 2026 - 4:36

The context is linear regression analysis for estimating a sample mean. Assume the usual multivariate linear model: $$Y = X\beta + \epsilon$$ with $X$ a $n \times p$ covariate matrix with intercept, $y$ a $n \times 1$ vector, and $\epsilon$ and $n \times 1$ independent and identically distributed vector of error terms with mean zero and variance $\sigma^2$. We estimate $\beta$ by OLS as $\hat{\beta}=(X^TX)^{-1}X^Ty$ from data $\{(y_1,x^T_1),...,(y_n,x^T_n)\}$.

Now suppose we have a new sample of $X_0$ drawn from a population distribution $p(X_0)$. Assume the same model applies in this population, but we do not observe $Y$. We are interested in estimating $E(Y)$ and in particular the variance of its estimator $\hat{E}(Y)$.

Since $$E(Y)=E_X(E(Y|X))=E_X( X\beta )$$ it seems an unbiased Monte-Carlo integration type of estimator for $E(Y)$ is $$\hat{E}(Y)= n_0^{-1} \sum_{i=1}^n x^T_{0i} \hat{\beta} = n_0^{-1} e^T X_0 \hat{\beta},$$ with $e$ the unit vector. How can I derive the variance of $\hat{E}(Y)$? At first I thought it is simply $$V(\hat{E}(Y)) = n_0^{-2} e^T X_0 V(\hat{\beta}) X_0^T e.$$

with well-known $V(\hat{\beta})=\sigma^2 (X^TX)^{-1}$. However, by simulation I realized this is not true. I think the problem here is that both $\hat{\beta}$ and $X_0$ are random, but I am unsure how to derive an estimator taking this into account.

Original Q&A

There are 1 best solutions below

**user140541** · Answer 1 · 2019-02-22 20:29:42

I assume that $S_n\equiv\{(x_i^{\top},\epsilon_i)\}_{i-1}^n$ and $S_{0,m}\equiv\{(x_{0,i}^{\top},\epsilon_{0,i})\}_{i=1}^m$are independent copies of $(x^{\top},\epsilon)$ with $\mathsf{E}[\epsilon\mid x]=0$ a.s. Then, indeed, a reasonable estimator of $\mathsf{E}y$ is $$ \hat{\mathsf{E}}y=\hat{\mathsf{E}}x^{\top}\times\hat{\beta}, $$ where $$ \hat{\mathsf{E}}x=\frac{1}{m}\sum_{i=1}^m x_{0,i} \quad\text{and}\quad \hat{\beta}=\left(\frac{1}{n}\sum_{i=1}^n x_ix_i^{\top}\right)^{-1}\frac{1}{n}\sum_{i=1}^n x_iy_i. $$

Since $S_n$ and $S_{0,m}$ are independent, $$ \mathsf{E}[\hat{\mathsf{E}}y]=\mathsf{E}[\hat{\mathsf{E}}x^{\top}]\times \mathsf{E}[\hat{\beta}]=\mathsf{E}x^{\top}\beta, $$ and $$ \mathsf{E}[\hat{\mathsf{E}}y]^2=\mathsf{E}\left[\hat{\mathsf{E}}x^{\top}\left(\mathsf{E}\hat{\beta}\hat{\beta}^{\top}\right)\hat{\mathsf{E}}x\right]=\beta^{\top}W_m\beta+\sigma^2\operatorname{tr}(W_mV_n), $$ where $V_n=\mathsf{E}\left[\sum_{i=1}^n x_ix_i^{\top}\right]^{-1}$ and $W_m\equiv\mathsf{E}[\hat{\mathsf{E}}x\hat{\mathsf{E}}x^{\top}]=\mathsf{E}x\mathsf{E}x^{\top}+m^{-1}\operatorname{Var}(x)$ (we assume that $V_n$ exists and is finite). Consequently, $$ \bbox[cornsilk,5px] { \begin{align} \operatorname{Var}(\hat{\mathsf{E}}y)&=\mathsf{E}[\hat{\mathsf{E}}y]^2-(\mathsf{E}[\hat{\mathsf{E}}y])^2\\ &=\sigma^2\mathsf{E}x^{\top}V_n\mathsf{E}x+\frac{1}{m}\left(\beta^{\top}\operatorname{Var}(x)\beta+\sigma^2\operatorname{tr}(\operatorname{Var}(x)V_n)\right). \end{align} } $$ It follows that $\operatorname{Var}(\hat{\mathsf{E}}y)\to\sigma^2\mathsf{E}x^{\top}V_n\mathsf{E}x$ as $m\to\infty$.

If you are interested in the conditional variance of $\hat{\mathsf{E}}y$ given $X_n\equiv\{x_i\}_{i=1}^n$, then using the same reasoning one obtains $$ \bbox[cornsilk,5px] { \operatorname{Var}(\hat{\mathsf{E}}y\mid X_n)=\sigma^2\mathsf{E}x^{\top}\tilde{V}_n\mathsf{E}x+\frac{1}{m}\left(\beta^{\top}\operatorname{Var}(x)\beta+\sigma^2\operatorname{tr}(\operatorname{Var}(x)\tilde{V}_n)\right), } $$ where $\tilde{V}_n=\left(\sum_{i=1}^n x_ix_i^{\top}\right)^{-1}$. Similarly to the previous case, $$ \operatorname{Var}(\hat{\mathsf{E}}y\mid X_n)\to \Sigma_n\equiv\sigma^2\mathsf{E}x^{\top}\tilde{V}_n\mathsf{E}x $$ as $m\to\infty$. Finally, by the SLLN (assuming that $\mathsf{E}\|x\|^2<\infty$ and $\mathsf{E}xx^{\top}$ is positive definite), $$ n\Sigma_n\to \sigma^2\mathsf{E}x^{\top}\left(\mathsf{E}xx^{\top}\right)^{-1}\mathsf{E}x\quad\text{a.s.} $$

You may be interested in deriving the asymptotic distribution of $\hat{\mathsf{E}}y$. For each $k\in\mathbb{N}$, we estimate $\hat{\mathsf{E}}y$ using two samples $\{(y_i,x_i^{\top})^{\top}\}_{i=1}^{n_k}$ and $\{x_{0,i}\}_{i=1}^{m_k}$ with $n_k,m_k\to\infty$ as $k\to\infty$. Let $\alpha_k$ be s.t. $$ \frac{\alpha_k}{\sqrt{m_k}}\to c_1<\infty \quad\text{and}\quad \frac{\alpha_k}{\sqrt{n_k}}\to c_2<\infty. $$ Then by the WLLN and CLT (assuming that $\mathsf{E}\|x\|^2<\infty$ and $\mathsf{E}xx^{\top}$ is positive definite), using the fact that two samples are independent, \begin{align} \alpha_k\left(\hat{\mathsf{E}}y-\mathsf{E}y\right)&=\frac{\alpha_k}{\sqrt{m_k}}\hat{\beta}^{\top}\sqrt{m_k}\left(\hat{\mathsf{E}}x-\mathsf{E}x\right)+\frac{\alpha_k}{\sqrt{n_k}}\hat{\mathsf{E}}x^{\top}\sqrt{n_k}\left(\hat{\beta}-\beta\right) \\ &=c_1\beta^{\top}N_1+c_2\mathsf{E}x^{\top}N_2+o_p(1), \end{align} where $N_1\sim \mathcal{N}(0,\operatorname{Var}(x))$, $N_2\sim \mathcal{N}(0,\sigma^2(\mathsf{E}xx^{\top})^{-1})$, and $N_1$ and $N_2$ are independent. Therefore, $$ \bbox[cornsilk,5px] { \alpha_k\left(\hat{\mathsf{E}}y-\mathsf{E}y\right)\xrightarrow{d}\mathcal{N}\left(0,c_1^2\beta^{\top}\operatorname{Var}(x)\beta+c_2^2\sigma^2\mathsf{E}x^{\top}\left(\mathsf{E}xx^{\top}\right)^{-1}\mathsf{E}x\right). } $$

How to derive the variance of the mean of predictions from a linear regression model?

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in REGRESSION

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions