Dividing observed entries by their variance in univariate linear regression

19 Views Asked by At

Suppose I have two random variables $X,Y$ with corresponding observations$(x_1,y_1), \ldots (x_n,y_n)$. Then for linear regression $y=ax+b$ I know that $\hat{a} = \bar{y}-\hat{b}\bar{x}$ and that $\hat{b} = \frac{\sigma_{xy}}{\sigma_{xx}}$.

In my machine learning textbook, the author states without proof that if we replace $x_i$ with $x_i' = \frac{x_i}{\sigma_{xx}}$, then $\hat{b} = \sigma_{x'y}$. I am having trouble seeing why this is the case, i.e that $\sigma_{xx}=\frac{1}{n}$ (?)

The denominator of $\hat{b}$ is then

$\sum_{i=1}^{n}(\frac{x_i}{\sigma_{xx}}-\bar{x})^2 = \sum_{i=1}^{n} (\frac{x_i^2}{\sigma_{xx}^2} -2 \frac{x_i}{\sigma_{xx}}\bar{x}+\bar{x}^2)$

Any insights appreciated.

1

There are 1 best solutions below

0
On BEST ANSWER

Your notation really should distinguish between sample versus population variances, but I will ignore this and use your notation.

Scaling the $x$ simply scales the slope as well. If you replaced every $x_i$ by $x_i' = x_i/2$, you'd expect the slope estimate to double. This is the basis for what is happening in the formulas.

More formally, note that $$\bar x' = \frac{1}{n} \sum_{i=1}^n x_i' = \frac{1}{n} \sum_{i=1}^n \frac{x_i}{\sigma_{xx}} = \frac{\bar x}{\sigma_{xx}}.$$ Then we can see that $$n \sigma_{x'x'} = \sum_{i=1}^n (x_i' - \bar x')^2 = \sum_{i=1}^n \frac{(x_i - \bar x)^2}{\sigma_{xx}^2} = n.$$ It immediately follows that $$\hat \beta' = \frac{\sigma_{x'y}}{\sigma_{x'x'}^2} = \sigma_{x'y}.$$ Your error is in scaling the observations, but not their mean. You must compute the sum of the squares of the form $(x_i' - \bar x')^2$, rather than $(x_i' - \bar x)^2$ which uses the wrong mean.