Deriving line of best fit with least squares

2.7k Views Asked by At

I know there are a lot of resources on this but I want to derive it.

So let's say we have $n$ known points of the form $x_k, y_k$ and we want a line $y = mx + b$ that best approximates the relationship. We approximate by minimizing the sum of the squared distances.

In other words we minimize

$s = \sum (y_k - (mx + b))^2 = \sum (y_k - mx - b)^2$

Set derivative to $0$:

$\frac{d}{dx} s = \sum -2m(y_k - mx - b) = 0$

Is this correct so far? I'm not quite sure where to go from here.

3

There are 3 best solutions below

0
On

What you really want to solve? Keep in mind, that the goal is to obtain the unkown coefficients $m$ and $b$ that minimises the following functional $$S(m,b) = \sum_k{[y_k-(mx_k+b)]^2}$$

What $S(m,b)$ represents is the distance between the true images $y_k$ corresponding to pairs $(x_k,y_k)$, and your aproximating straight line $\tilde{y}_k=mx_k+b$. This distance is $(y_k-\tilde{y}_k)^2$.

The square in the functional $S(m,b)$ makes the extremum a unique minimum.

In order to find the minimum of the surface given by the functional $S(m,b)$ one must compute its (2-dimensional) gradient and set it to zero. $$\vec{grad}\,S(m,b)=\vec{0}$$ The last equation provides us 2 equations for 2 unknowns: $m$ and $b$.

These two equations read: $$\frac{\partial S(m,b)}{\partial m}=0$$ $$\frac{\partial S(m,b)}{\partial b}=0$$ Can you keep going from that?

2
On

Note that $y = m x +b = \pmatrix{1 & x} \pmatrix{b \\ m}$

So make a matrix $A$ for all the $n$ points such as

$$ {\rm A} = \pmatrix{ 1 & x_1 \\ 1 & x_2 \\ \vdots \\ 1 & x_k \\ \vdots } $$

and a vector of the $y$ values

$$ {\bf y} = \pmatrix{ y_1 \\ y_2 \\ \vdots \\ y_k \\ \vdots} $$

Then you are trying to solve

$$ {\bf y} = {\rm A} {\bf c} $$

for ${\bf c}=\pmatrix{b \\ m}$.

The least squares method is

$$ {\rm A}^\top {\bf y} = ({\rm A}^\top {\rm A}) {\bf c} \Rightarrow$$

$$ \boxed{ {\bf c} = ({\rm A}^\top {\rm A})^{-1} {\rm A}^\top {\bf y} } $$

Note that ${\rm A}^\star = ({\rm A}^\top {\rm A})^{-1} {\rm A}^\top$ is called the pesudo inverse of ${\rm A}$.

Expanded out this gives

$$ {\bf c} = \pmatrix{ 1 & {\bf x} \\ {\bf x}^\top & {\bf x}^\top {\bf x}}^{-1} \pmatrix{ {\bf y} \\ {\bf x}^\top {\bf y} } $$

which you can expand further to get a direct expression. Here ${\bf x} = \pmatrix{x_1 & x_2 & \cdots & x_k & \cdots}^\top$.

0
On

Remember that $x_k$ is your data for the independent variable. There are the same number of $x$'s as $y$'s. So you need an index. You are minimizing $ SS =\sum_k (y_k-(mx_k+b))^2 $

As another answer indicated your problem is you want to minimize in terms of $m$ and $b$. (You are estimating your coefficients. The data can't change.)

So you have $$\frac{\partial}{\partial m } SS = -2\sum_k x_k(y_k-(mx_k+b)) = 0 \\ \frac{\partial}{\partial b} SS = -2\sum_k (y_k-(mx_k+b)) = 0 $$ as the other answer indicated, you know the solution to these equations will be the unique minimum because $SS$ is an upward-opening paraboloid as a function of $m$ and $k$.

Now solve these two equations for $b$ and $m.$ You may find it convenient to introduce the notation for the sample mean $ \bar x = \frac{1}{n}\sum_k x_k.$


I complained in the comments about an incorrect answer on here, but it did get something right. The treatment is simpler and more generalizable if you write instead of $y_k\approx mx_k+b$ the vector equation $$ y \approx X\beta $$ where $\beta = (b,m)^T$ are your coefficients and $X$ is an $n\times 2 $ matrix whose first column is all ones and whose second column is the vector of $x$'s.

(It is generalizable because you can easily add more regressors by adding additional columns to $X$ and corresponding coefficients to $\beta.$)

Then to get the least squares estimator for $\beta$ we minimize $SS = (y-X\beta)^2$ (where we're using somewhat abbreviated vector notation). The minimum is again given by setting the gradient to zero $$0=\nabla_{\beta} (y -X\beta)^2 =-2 X^T(y-X\beta)$$ so we have $$ X^TX\beta=X^Ty \\\Rightarrow \beta = (X^TX)^{-1}X^T y$$ provided $X^TX$ is invertible.

(And all this invertibility requires is that $X$ is full column rank. In other words your regressors cannot be linearly dependent. So, for instance, in your case of a simple best fit line, invertibility fails only if the $x$'s are all the same... which would mean your points all lie on a vertical line.)