Math theory behind the slope formula of Least Squares

190 Views Asked by At

When studying linear regression, I see the slope formula of least squares like this: least squares. But I couldn't find the math theory behind it, i.e. how mathematicians came up with this equation, or why this equation gives out the best fitting line. Can you please explain it for me or point me to the right resource like a website or a book? Thanks.

2

There are 2 best solutions below

3
On BEST ANSWER

You can derive them by minimizing the sum of the squared errors between your "y" datavalues and the trendline predictions.

$$ SSE = \sum_{n=1}^N \Big( y_n - (ax_n+b) \Big)^2$$


The basic idea is to think of SSE as a parabolioid in the variables $a$ and $b$. The vertex of the paraboloid will tell us the values of $a$ and $b$ which minimize the SSE.

In the form written above it is not easy to discern what the vertex is. We will have to do a bit of algebra to get there.

$$ SSE = \sum_{n=1}^N \Big( y_n - (ax_n+b) \Big)^2$$

First expand the binomial

$$ = \sum_{n=1}^N \Big[ y_n^2 + (ax_n+b)^2 - 2 y_n (ax_n+b) \Big] $$

$$ = \sum_{n=1}^N \Big[ y_n^2 + a^2 x_n^2 +b^2 + 2ab x_n - 2 a y_n x_n - 2 b y_n \Big]$$

Now distribute the sum

$$ = \sum_{n=1}^N y_n^2 + a^2 \sum_{n=1}^N x_n^2 + b^2 \sum_{n=1}^N 1 + 2ab \sum_{n=1}^N x_n - 2 a \sum_{n=1}^N y_n x_n - 2 b \sum_{n=1}^N y_n $$

We will introduce the short hand notation,

$$ \overline{z} = \frac{1}{N} \sum_{n=1}^N z_n,$$

our expression then can be written as,

$$ SSE = N \overline{y^2} + a^2 N \overline{x^2} + b^2 N + 2ab N\overline{x} - 2 a N \overline{xy} - 2 b N \overline{y}$$

$$ SSE = \Big( N \overline{x^2} \Big)a^2 + \Big( 2 N\overline{x} \Big) ab + \Big( N \Big) b^2 + \Big( - 2 N \overline{xy}\Big) a + \Big(- 2 N \overline{y}\Big) b + \Big(N \overline{y^2} \Big)$$

Now to find the vertex we compute the partial derivatives with respect to $a$ and $b$ setting them equal to $0$.

$$ \frac{\partial SSE}{\partial a} = \Big( 2N \overline{x^2} \Big)a + \Big( 2 N\overline{x} \Big) b + \Big( - 2 N \overline{xy}\Big) = 0 \qquad \text{(A)}$$

$$ \frac{\partial SSE}{\partial b} = \Big( 2 N\overline{x} \Big) a + \Big( 2N \Big) b + \Big(- 2 N \overline{y}\Big) = 0 \qquad \text{(B)} $$

We now need to solve this system of equations for $a$ and $b$. We can eliminate $b$ and solve for $a$ by multiplying equation B by $\overline{x}$ and subtracting it from equation A.

$$ A - B \overline{x}$$

$$ \Big( 2N \overline{x^2} \Big)a + \Big( 2 N\overline{x} \Big) b + \Big( - 2 N \overline{xy}\Big) -\Big( 2 N\overline{x}^2 \Big) a - \Big( 2N \overline{x} \Big) b - \Big(- 2 N \overline{y}\overline{x} \Big) = 0 $$

$$ \Big( 2N \overline{x^2} - 2N \overline{x}^2 \Big)a - \Big( 2 N \overline{xy} - 2N \overline{x} \overline{y} \Big) = 0 $$

$$ \Big( \overline{x^2} - \overline{x}^2 \Big)a = \Big( \overline{xy} - \overline{x} \overline{y} \Big) $$

$$ a = \frac{\overline{xy} - \overline{x} \overline{y} }{\overline{x^2} - \overline{x}^2 } $$

$$a = \frac{ \frac{1}{N} \sum x_n y_n - \frac{1}{N^2} \sum x_n \sum y_n}{ \frac{1}{N} \sum x_n^2 - \frac{1}{N}^2 \Big(\sum x_n \Big)^2 }$$

$$ \boxed{ a = \frac{ N \sum x_n y_n - \sum x_n \sum y_n}{ N \sum x_n^2 - \Big(\sum x_n \Big)^2 }} $$

We can eliminate $a$ from the equations by taking the difference of equation A times $\bar{x}$ and B times $\bar{x^2}$.

$$ A \overline{x} - B \overline{x}^2 $$

$$ Big( 2N \overline{x^2}\overline{x} \Big)a + \Big( 2 N\overline{x}^2 \Big) b + \Big( - 2 N \overline{xy} \overline{x} \Big) - \Big( 2 N\overline{x} \overline{x^2} \Big) a - \Big( 2N \overline{x^2} \Big) b - \Big(- 2 N \overline{y} \overline{x^2} \Big) = 0 $$

$$ \Big( 2 N\overline{x}^2 - 2N \overline{x^2} \Big) b - \Big(2 N \overline{xy} \overline{x} - 2 N \overline{y} \overline{x^2} \Big) = 0 $$

$$ \Big( 2 N\overline{x}^2 - 2N \overline{x^2} \Big) b = \Big(2 N \overline{xy} \overline{x} - 2 N \overline{y} \overline{x^2} \Big) $$

$$ b = \frac{ 2 N \overline{xy} \overline{x} - 2 N \overline{y} \overline{x^2} }{2 N\overline{x}^2 - 2N \overline{x^2}} $$

$$ b = \frac{ \overline{xy} \overline{x} - \overline{y} \overline{x^2} }{ \overline{x}^2 - \overline{x^2}} $$

0
On

The argument is given and criticized in Numerical Recipes and I suspect in many other numerical analysis books. We assume (without evidence) that the measurement errors are normally distributed, then note that the parameters we get give the highest probability that our measurements were the ones we got. There is a flip between the parameters being the ones giving the highest probability that we got the measurements and the parameters being the most probable ones given the measurements we got. The normal distribution is very convenient because we can prove things based on it and the equations determining the parameters are easily soluble. Whether it reflects reality is not a mathematical question....

There is a famous quote from Poincaré to the effect that physicists believe the normal distribution is a mathematical theorem, mathematicians believe it is an experimental fact.