Proof for why $\Sigma e_i\hat{y_i} = 0$

2.1k Views Asked by At

I understand that this is true, but can someone explain to me why (preferably without getting into matrices)?

This is referring to a linear regression, where the $e_i$ refers to the i-th residual and $\hat{y_i}$ refers to the i-th fitted value.

2

There are 2 best solutions below

0
On

The short answer is that this stems from the least square algorithm where the residuals are orthogonal to the $x$-s, or formally $$\sum_{i=1}^n e_i \hat{y}_i = \hat{\beta}_0 \sum_{i=1}^n e_i + \sum_{i=1}^n\sum_{j=1}^p\hat{\beta}_je_ix_{ij} = \hat{\beta}_0 \sum_{i=1}^n e_i + \sum_{j=1}^p\hat{\beta}_j\sum_{i=1}^ne_ix_{ij} . $$

From the first order condition (gradient of the sum of squares w.r.t $\beta$) you have that $\sum_{i=1}^n e_i x_{ij} = 0$ for every $j=0,1,...,p$ because when you take derivative w.r.t. $\beta_j$ you get $\sum_{i=1}^n(y_i - \hat{\beta}_0 - \sum_{j=1}^p\hat{\beta}_jx_{ij}) x_{ij}=\sum_{i=1}^n e_i x_{ij}=0$.

0
On

If you're comfortable with linear algebra, this is the most intuitive way of looking at linear regression (in my opinion). Say you're given points (ie vectors) $x_1,\ldots, x_n$ and $y$. For linear regression, you want to pick a weight vector $\beta$ that minimizes the residuals $e_i$. In other words, you want to pick $\hat{\beta}$ so that $$\sum e_i \hat{y_i} = \langle e, \hat{y} \rangle = 0,$$ where $\hat{y} = X\hat{\beta}$. If it's not immediately clear from the image, try imagining how the length of the vector $\epsilon$ changes as you move $\hat{y}$ around in the plane (which is the column space of the $x_i's$. The shortest possible distance is attained when $\hat{y}$ is "directly below" $y$, if that makes sense!