Least Squares Estimate B1 formula

1.1k Views Asked by At

My regression textbook textbook says that $\sum_{i=1}^{n} (x_i - \overline{x}) = 0$.

I know that:

$\sum_{i=1}^{n} (x_i - \overline{x})(y_i - \overline{y})$

$= \sum_{i=1}^{n} (x_i - \overline{x})y_i - \overline{y}\sum_{i=1}^{n} (x_i - \overline{x})$

$= \sum_{i=1}^{n} (x_i - \overline{x})y_i$

How come the equation doesn't simplify to zero? Any help clearing this would be greatly appreciated. Does the $y_i$ and $\overline{y}$ make a difference?

2

There are 2 best solutions below

0
On

Short answer: the $y_i$ terms make a difference.

Example: consider three pairs: $(2, 4), (3, 5), (4, 8)$. Note that $\bar x = 3$. However, \begin{align*} \sum_{i=1}^n (x_i - \overline x) y_i &= (2-3) \cdot 4 + (3-3) \cdot 5 + (4-3)\cdot 8 \\ &= 4. \end{align*} It is the case that $\sum_{i=1}^n(x_i - \overline x) = 0$, but when you start weighting those $x_i - \overline x$ terms with other expressions such as $y_i$, there's no reason they still have to sum to $0$.

0
On

Observe that$$\sum_{i=1}^{n} (x_i - \overline{x}) = \sum_{i=1}^n x_i - \sum_{i=1}^n \overline{x} = n \overline{x} - \overline{x} \sum_{i=1}^n 1 = n \overline{x} - n \overline{x}=0.$$

But the equation

$$ \sum_{i=1}^n (x_i - \overline{x})y_i = \sum_{i=1}^n x_i y_i - \overline{x} \sum_{i=1}^n y_i$$ is not necessarily zero (see Aaron Montgomery's example)