Covariance of Residuals and Fitted Values in Linear Regression

1k Views Asked by At
Consider the simple linear regression model

$Y_i = \beta_0 + \beta_1x_i + \epsilon_i$

where $\epsilon_i \sim^{indep} N(0, \sigma^2)$ for $i = 1,...,n$. Let $\hat{\beta_{0}}$ and $\hat{\beta_{1}}$ be the usual maximum likelihood estimators of $\beta_0$ and $\beta_1$, respectively. The $i$th residual is defined as $\hat{\epsilon_{i}} = Y_i - \hat{Y_{i}}$, where $\hat{Y_i} = \hat{\beta_{0}} + \hat{\beta_{1}}x_i$ is the $i$th fitted value.

Derive $Cov(\hat{\epsilon_{i}},\hat{Y_i})$

This is what I've got so far, I keep getting stuck though even trying to do this different ways

$$ \begin{align} Cov(\hat{\epsilon_{i}},\hat{Y_i}) &= E[\hat{\epsilon_{i}}\hat{Y_i}] - E[\hat{\epsilon_{i}}]E[\hat{Y_i}]\\ &= E[\hat{\epsilon_{i}}\hat{Y_i}] \text{ (as }E[\hat{\epsilon_{i}}]=0)\\ &= E[\hat{\epsilon_{i}}(Y_i - \hat{\epsilon_{i}})]\\ &= Y_iE[\hat{\epsilon_{i}}] - E[\hat{\epsilon_{i}}^2]\\ &= - E[\hat{\epsilon_{i}}^2] \end{align} $$

But I don't know how to proceed further with this as the residuals are not independent. Proceeding similarly from the second line but rearranging differently, I also got

$$ Cov(\hat{\epsilon_{i}},\hat{Y_i}) = Y_i^2 - E[\hat{Y_{i}}^2] = - E[\hat{\epsilon_{i}}^2] $$

Which I'm having the same problem with, I feel like the answer is supposed to be zero but I'm just missing some piece of info to prove it.


Edit: I've been retrying this question and I think this may be a better method, although I'm stuck at a point with this too: $$ \begin{align} Cov(\hat{\epsilon_{i}},\hat{Y_i}) &= Cov(Y_i - \hat{Y_i}, \hat{Y_i})\\ &= Cov(Y_i ,\hat{Y_i}) - Cov(\hat{Y_i},\hat{Y_i})\\ &= Cov(Y_i ,\hat{Y_i}) - Var(\hat{Y_i}) \end{align} $$ So, I know what $Var(\hat{Y_i})$ is, but I do not know what $Cov(Y_i ,\hat{Y_i})$ is, although I suspect it is $Var(\hat{Y_i})$. If someone could help me with a derivation for this, that would be amazing.

2

There are 2 best solutions below

0
On BEST ANSWER

\begin{align} cov(Y_i, \hat Y_i) &= cov(Y_i, \hat \beta_0 + \hat \beta_1 X_i ) \\ & = cov(Y_i, \hat \beta_0 + \hat \beta_1 X_i )\\ & = cov(Y_i, \bar Y - \hat \beta_1 \bar X + \hat \beta_1 X_i )\\ & = cov(Y_i, \bar Y - \hat \beta_1 (X_i - \bar X) )\\ & = \frac{1}{n}Var(Y_i) + cov(Y_i , \frac{(X_i - \bar X) \sum (X_i - \bar X ) Y_i }{\sum ( X_i - \bar X ) ^ 2})\\ & = \frac{1}{n}\sigma ^ 2 + cov(Y_i , \frac{(X_i - \bar X) ^ 2 Y_i }{\sum ( X_i - \bar X ) ^ 2})\\ & = \frac{1}{n}\sigma ^ 2 + \frac{(X_i - \bar X) ^ 2 Var(Y_i) }{\sum ( X_i - \bar X ) ^ 2}\\ & = \frac{1}{n}\sigma ^ 2 + \frac{\sigma ^ 2 (X_i - \bar X) ^ 2 }{\sum ( X_i - \bar X ) ^ 2}\\ \end{align}

which is the same as $Var(\hat Y_i)$.

6
On

A much more straightforward and cleaner approach is to consider a matrix approach. Here the model is $$ y = \beta_0 \mathbf{1} + \beta_1 x + e = X \beta + e $$ where $X = [\mathbf{1}, x]$ and $\beta^\intercal = [\beta_0, \beta_1].$ Then, $\hat y = X (X^\intercal X)^{-1} X^\intercal y = P_Xy,$ where $P_X$ is the orthogonal projector onto the linear subspace $V_X = \langle X \rangle.$ Then $\hat e = y - \hat y = (I - P_X)y = P_X^\perp y.$ Finally, $$ \mathbf{E}(\hat e \hat y^\intercal) = \mathbf{E}(P_X^\perp y y^\intercal P_X) = P_X^\perp \mathbf{E}(yy^\intercal) P_X = P_X^\perp (\sigma^2 I) P_X = 0, $$ since $P_X^\perp P_X = 0$ (project onto $V_X$ and then onto $V_X^\perp$ gives zero). Then, entry $(i,i)$ of $\mathbf{E}(\hat e \hat y^\perp)$ is zero, i.e. $\mathbf{E}(\hat e_i \hat y_i) = 0.$ (Note that we never used normality, just that $\mathbf{E}(yy^\intercal) = \sigma^2 I,$ meaning uncorrelated observations with a common variance.)