Given: $y = b_0 + b_1x$
I am wondering what is the explanation behind this formula for estimating the $ b_1$ coefficient:
$$ b_1 = \frac{\sum_{i=1}^n( x_i-\bar{x})(y_i-\bar{y})}{ \sum_{i=1}^n( x_i-\bar{x})^2 } $$
What are the steps to derive this formula?
Part 1 Update -March 18 2021:
When tried to substitute $\bar{y} - b_1\bar{x}$ for $b_0$ in
$$ b_0 \bar{x} + b_1 \overline{x^2} = \overline{xy} $$ I got stuck with $b_1$ in both sides of the equations.
$$ b_1 \overline{x^2} = \overline{xy}-(\overline{x} \bar{y} - b_1\overline{x^2}) $$
Can you please guide me in further derivation steps. Thanks
Part 2 Update
With another help from @MartinVesely, I realized that this should be:
$$b_0 \bar{x} + b_1 \overline{x^2} = \overline{xy}$$
$$ ((\bar{y} - b_1\bar{x})\bar{x}) + b_1 \overline{x^2} = \overline{xy}$$
$$(\bar{x}\bar{y} - b_1(\bar{x})^2) + b_1 \overline{x^2} = \overline{xy}$$
$$( - b_1(\bar{x})^2) + b_1 \overline{x^2} = \overline{xy} - \bar{x}\bar{y} $$
$$b1( -(\bar{x})^2 + \overline{x^2}) = \overline{xy} - \bar{x}\bar{y} $$
$$b1= \frac{\overline{xy} - \bar{x}\bar{y} }{ \overline{x^2} -(\bar{x})^2} $$
A derivation of the formula is done with the least square method.
Firstly write down a function $L = \sum_{i=1}^n (y_i - b_0 - b_1 x_i)^2$. This is a sum of squared differences between actual output data $y_i$ and output given by a regression line.
Our goal is to minimize a difference between actual data and theregression line. This means that we need to calculate first derivatives with respects to $b_0$ and $b_1$:
$$ \frac{\partial L}{\partial b_0} = -\sum_{i=1}^n 2(y_i - b_0 - b_1 x_i) $$
$$ \frac{\partial L}{\partial b_1} = -\sum_{i=1}^n 2x_i(y_i - b_0 - b_1 x_i) $$
Now, by setting $\frac{\partial L}{\partial b_0}$ and $\frac{\partial L}{\partial b_1}$ equal to zero and dividing by -2 we have
$$ \sum_{i=1}^n (y_i - b_0 - b_1 x_i) = 0 $$
$$ \sum_{i=1}^n x_i(y_i - b_0 - b_1 x_i) = 0 $$
Rewriting leads to $$ \sum_{i=1}^n (y_i - b_0 - b_1 x_i) = \sum_{i=1}^n y_i - b_1\sum_{i=1}^n x_i - nb_o = 0 $$
$$ \sum_{i=1}^n x_i(y_i - b_0 - b_1 x_i) = \sum_{i=1}^n x_iy_i - b_1\sum_{i=1}^n x_i^2 -b_0\sum_{i=1}^n x_i = 0 $$
Now, if we divide both eqautions by $n$ and rearranging them, we have $$ b_0 + b_1 \bar{x} = \bar{y} $$
$$ b_0 \bar{x} + b_1 \overline{x^2} = \overline{xy}, $$
where $\bar{x}$ is average of $x_i$ values (similarly for $y_i$) and $\overline{xy}$ is average of products $x_iy_i$.
Clearly $b_0 = \bar{y} - b_1\bar{x}$. After substituing this to the other equation we get $$ b_1 = \frac{\overline{xy} -\bar{x}\bar{y}}{\overline{x^2}-(\bar{x})^2}. $$
Since $\overline{xy} -\bar{x}\bar{y}$ is covariance of $x$ and $y$ and $\overline{x^2}-(\bar{x})^2$ is variance of $x$ we have your formula, because $$ \frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^2 $$ is variance of $x$ and $$ \frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) $$ is covariance of $x$ and $y$.