Deriving the variance of a predicted response variable in simple linear regression

227 Views Asked by At

For simple linear regression, I'm trying to derive the variance of the estimator of an individual response variable at some value $X = x^\ast$.

Let $\hat y^\ast$ be the conditional mean response, and $Y^\ast$ be an individual mean response.

It seems like the estimators for BOTH $\hat y^\ast$ and $Y^\ast$ are:

$\hat\beta_0 + \hat\beta_1 x^\ast$

Obviously, the variances for these 2 estimators are different, so this is where I'm stuck. If I apply the $Var()$ function to the same estimator, I expect to get the same estimator. However, we know that

  • the variance of the conditional mean response is $\sigma^2 (\frac{1}{n} + \frac{(x^\ast - \bar{x})^2}{S_{xx}})$
  • the variance of an individual response is $\sigma^2 (1 + \frac{1}{n} + \frac{(x^\ast - \bar{x})^2}{S_{xx}})$

I have read several textbooks about this, and I keep seeing this:

\begin{align*} \mathrm{Var}(Y^*-\hat y^*) & = \mathrm{Var}(Y-\hat y\mid X=x^*)\\ & =\mathrm{Var}(Y\mid X=x^*)+\mathrm{Var}(\hat y\mid X=x^*)-2\mathrm{Cov}(Y,\hat y\mid X=x^*)\\ & = \sigma^2+\sigma^2\left[\frac{1}{n}-\frac{(x^*-\bar x)^2}{SXX}\right]-0\\ & =\sigma^2\left[1+\frac{1}{n}+\frac{(x^*-\bar x)^2}{XXX}\right] \end{align*}

This is from Page 37 in Chapter 2.7.4 in "A Modern Approach to Regression with R" by Simon Sheather.

Where did $V(Y^\ast - \hat y^\ast)?$ come from? Every textbook about this topic seems to just pluck this out of thin air. If we want the variance of an estimator of $Y^\ast$, then why don't we apply the $\mathrm{Var}()$ function to that estimator?