I want to derive the expectation of the sum of squares of the $n$ predicted $y$ values for a multiple linear regression model with $K$ regressors.

105 Views Asked by At

So I want to find $E(\sum_{i=1}^n \hat{y_i}^2) = E(\hat{y}^T \hat{y})$. So, I have that $$\hat{y}^T \hat{y} = \hat{\beta}^T X^T X \hat{\beta} = [(X^TX)^{-1}X^Ty]^TX^TX(X^TX)^{-1}X^Ty$$ $$ = y^TX(X^TX)^{-1}X^Ty $$ $$ = y^THy $$ I'm not really sure where to go from here. The above result is just a scalar so I think I can take the trace and say that $$ E(y^THy)=E(tr(y^THy))=E(\sum_{i=1}^n{y_i}^2) $$ $$ =\sum_{i=1}^nE({y_i}^2) = \sum_{i=1}^nV(y_i)+\sum_{i=1}^n[E(y_i)]^2 $$ $$ =n({\sigma}^2 +\bar{y}^2). $$ I have a few questions about my approach here. (1) Here we assumed that the $y$ are normally distributed with constant variance (due to the nature of the random error). Does this mean that the estimates, $\hat{y}$, are also normally distributed? (2) I'm having trouble understanding intuitively why the expected value of the estimate would be different from the expected value of $y$. It seems like if what I just derived is correct, then $E(\sum_{i=1}^n \hat{y_i}^2)=E(\sum_{i=1}^n {y_i}^2) $, does this make sense? (3) I'm having trouble understanding when an expectation can be further evaluated. For example, in this case should I have left the expectation in the final answer or was it appropriate to replace it with $\bar{y}$?

1

There are 1 best solutions below

1
On BEST ANSWER

$\hat{y} = X(X^TX)^{-1} X^T y = Hy$ and if $y \sim N(X\beta, \sigma^2 I)$, it follows that $\hat{y}$ is a linear transformation of $y$ and so is also Gaussian. We have

$$ E[\hat{y}] = E[X(X^TX)^{-1} X^T y] = X(X^TX)^{-1} X^T E[y] = X(X^TX)^{-1} X^T X \beta = X\beta $$ and \begin{align*} \text{var}[\hat{y}] &= X(X^TX)^{-1} X^T \text{var}(y) X (X^T X)^{-1}X^T\\ &= X(X^TX)^{-1} X^T \sigma^2 I X (X^T X)^{-1}X^T\\ &=\sigma^2 X(X^T X)^{-1}X^T \end{align*}

In summary

$$ \hat{y} \sim N(X\beta, \sigma^2 X(X^T X)^{-1}X^T) = N(X\beta, \sigma^2H) $$

So if we want to compute $E[\| \hat{y} \|^2]$ your working is mostly correct up to the point \begin{align*} E[\text{tr}(yy^T H)] &= \text{tr}(E[yy^T]H)\\ &=\text{tr}((X\beta \beta^T X^T + \sigma^2I)H)\\ &=\text{tr}(X\beta \beta^T X^TH) + \sigma^2 \text{tr}(H)\\ &=\text{tr}(X\beta \beta^T X^T (X(X^TX)^{-1}X^T)) + \sigma^2 \text{tr}(H)\\ &=\text{tr}(X\beta \beta^TX^T) + \sigma^2 \text{tr}(H)\\ &=\text{tr}(X\beta \beta^TX^T) + \sigma^2 \text{tr}(H)\\ &=\text{tr}((X\beta)^TX\beta ) + \sigma^2 \text{tr}(H)\\ &=\| X\beta\|^2_2 + \sigma^2 \text{tr}(H)\\ \end{align*}

More generally, for $Z \sim N(\mu, \Sigma)$ $$ E[\|Z\|^2_2] = \| \mu\|^2_2 + \text{tr}(\Sigma) $$

so we could have derived the result directly by noting that $\hat{y}$ has a multivariate normal distribution. Also note that $\text{tr}(H) = \text{rank}(X)$.

It is also not a good idea (in my opinion at least) to use sample notation for population quantities, i.e.

$$ \bar{y} = \frac{1}{n}\sum_{i=1}^n y_i $$ is the sample average, and in your last line you should write

$$ \frac{1}{n}\sum_{i=1}^n E[y_i] = \frac{1}{n}\sum_{i=1}^n \mu = \mu $$ since they are i.i.d.