Proof that the least square estimators are normally distributed

30 Views Asked by At

In my book I have the following proof showing that one of the least square estimators is normally distributed:

$\hat\beta_i$ = $\frac {S_{xy}}{S_{xx}}$ = $\frac {1}{S_{xx}}\sum_1^n({x_i}- \bar{x})(Y_i -\bar{Y}) = \frac{1}{S_{xx}}\left[\sum_1^n({x_i} - \bar{x})Y_i - \bar{Y}\sum_1^n(x_i - \bar{x})\right] = \frac{1}{S_{xx}}\sum_1^n(x_i-\bar{x})Y_i$

According to my book the last equality holds because $\bar{Y}\sum_1^n(x_i-\bar{x})=0$ I find this very confusing. shouldn't $\sum_1^n(x_i-\bar{x})Y_i=0$ aswell then? Can somebody explain to me what's going on?

1

There are 1 best solutions below

1
On

It is clear from your comment that I need to start from the beginning.

We have a set of "input" data $(x_1, x_2, \ldots, x_n)$, for which the "output" or "response" is $(y_1, y_2, \ldots, y_n)$. We can also think of these as being ordered pairs: $$\{ (x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n) \}.$$ So the ordered pair $(x_i, y_i)$ represents the observed relationship between the $i^{\rm th}$ input $x_i$ and the $i^{\rm th}$ response $y_i$.

Now define the values

$$\bar x = \frac{1}{n} \sum_{i=1}^n x_i, \quad \bar y = \frac{1}{n} \sum_{i=1}^n y_i.$$ These are the respective sample means of the inputs and the responses, respectively. It is worth noting that multiplying both equations by $n$ on both sides gives $$n \bar x = \sum_{i=1}^n x_i, \quad n \bar y = \sum_{i=1}^n y_i;$$ namely, the sum of the observations equals $n$ times the sample mean.

We can now easily see that $$\sum_{i=1}^n (x_i - \bar x) = \sum_{i=1}^n x_i - \sum_{i=1}^n \bar x = n \bar x - n \bar x = 0. $$ This simply says that the sum of the residuals is zero. This makes intuitive sense: the total of the differences/deviations of the observations from their common mean will cancel each other out--put another way, the average of a set of numbers is the value for which the sum of the deviations will be zero.

If we multiply this equation by any fixed quantity, it still remains true: so for example, $$\bar y \sum_{i=1}^n (x_i - \bar x) = 0$$ in as much as $ab = 0$ if $b = 0$. But it is NOT the case you multiply by a value that changes inside the sum: $$\sum_{i=1}^n (x_i - \bar x) y_i$$ is not the same thing, because here you can see that the subscript on the $y_i$ means that it is a function of the index of summation $i$, thus it will change inside the sum. This is what I meant by my previous comment that $$a_1 b_1 + a_2 b_2 + a_3 b_3 \ne (a_1 + a_2 + a_3)(b_1 + b_2 + b_3).$$ You can't "factor out" the $y_i$ because it too is being summed at the same time as $(x_i - \bar x)$ is being summed. For the same reason, $$\sum_{i=1}^n x_i y_i \ne \sum_{i=1}^n x_i \sum_{i=1}^n y_i.$$ The sum of products is not equal to the product of the sums. You need to really understand this concept before you can proceed further.

Try it with $n = 3$ with the following data: $$\begin{align*} (x_1, y_1) &= (1,1) \\ (x_2, y_2) &= (2,5) \\ (x_3, y_3) &= (6, 11). \end{align*}$$ You will find $\bar x = (1+2+6)/3 = 3$; $\bar y = (1+5+11)/3 = 17/3$. Now compute $(x_1 - \bar x, x_2 - \bar x, x_3 - \bar x) = (-2, -1, 3)$, and we easily see that the sum is $$\sum_{i=1}^3 (x_i - \bar x) = -2 -1 + 3 = 0,$$ as expected. Then $$\bar y \sum_{i=1}^3 (x_i - \bar x) = \frac{17}{3} \cdot 0 = 0.$$ But $$\sum_{i=1}^3 (x_i - \bar x) y_i = (-2)(1) + (-1)(5) + (3)(11) = -2 -5 + 33 = 26 \ne 0.$$