The expectation of residual sum of squares when expectation of regression mean response doesn't equal to true mean response

2.5k Views Asked by At

Textbook Applied Linear Statistical Models (5th edition) by Kutner, Nachtsheim, Neter, and Li In Page 359, equation (9.12) states:

$$\sum{E(y_i - \hat{y_i})^2} = \sum{(E(\hat{y_i}) - u_i)^2}+(n-p)\sigma^2$$

$u_i$ is the true mean response for observation i

$\hat{y_i}$ is the regression mean response for observation i

$\sigma^2$ is the variance of $y$

$p$ is the number of parameters in the linear regression

It is given that$$\sum{\sigma^2(\hat{y_i})}= p\sigma^2$$

In this case $E(\hat{y_i})$ is not necessarily equal to $u_i$.

No proof is given for equation 9.12, so I took a stab at it but I couldn't finish the proof. The following is what I got:

$$\sum{E(y_i - \hat{y_i})^2}$$ $$=\sum{E(y_i - u_i + u_i -\hat{y_i})^2}$$ $$=\sum{E((y_i-u_i)^2)+2E(y_i - u_i)(u_i -\hat{y_i})+E((u_i -\hat{y_i})^2)}$$ $$=\sum(\sigma^2+2E(y_iui)-2E(u_i^2)-2E(y_i\hat{y_i})+2E(u_i\hat{y_i})+E(u_i^2)-2E(u_i\hat{y_i}) + E(\hat{y_i}^2))$$ $$=\sum(\sigma^2 + 2u_i^2 -2u_i^2-2E(yi\hat{y_i})+E(u_i^2)+E(\hat{y_i}^2))$$ $$=\sum(\sigma^2-2E(yi\hat{y_i})+E(u_i^2)+E(\hat{y_i}^2))$$ $$=\sum(\sigma^2-2E(yi\hat{y_i})+E(u_i^2)+E^2(\hat{y_i})+\sigma^2(\hat{y_i}))$$ $$=\sum(\sigma^2-2E(yi\hat{y_i})+\sigma^2(\hat{y_i})+2E(u_i\hat{y_i})+E(u_i^2)+E^2(\hat{y_i})-2E(u_i\hat{y_i}))$$ $$=\sum(\sigma^2-2E(yi\hat{y_i})+\sigma^2(\hat{y_i})+2E(u_i\hat{y_i})+(E(\hat{y_i}) - u_i)^2)$$ $$=n\sigma^2+p\sigma^2+\sum{(E(\hat{y_i}) -u_i)^2}-2\sum{(E(yi\hat{y_i})-E(u_i\hat{y_i}))}$$

Now compare the last result with the right hand of 9.12 equation, I need to prove that $$2\sum{(E(yi\hat{y_i})-E(u_i\hat{y_i}))}=2p\sigma^2$$ so basically I need to show $$\sum{E(\hat{y_i}(y_i-u_i))}=p\sigma^2$$ but I don't know how to do that