I know there are a lot of resources on this but I want to derive it.
So let's say we have $n$ known points of the form $x_k, y_k$ and we want a line $y = mx + b$ that best approximates the relationship. We approximate by minimizing the sum of the squared distances.
In other words we minimize
$s = \sum (y_k - (mx + b))^2 = \sum (y_k - mx - b)^2$
Set derivative to $0$:
$\frac{d}{dx} s = \sum -2m(y_k - mx - b) = 0$
Is this correct so far? I'm not quite sure where to go from here.
What you really want to solve? Keep in mind, that the goal is to obtain the unkown coefficients $m$ and $b$ that minimises the following functional $$S(m,b) = \sum_k{[y_k-(mx_k+b)]^2}$$
What $S(m,b)$ represents is the distance between the true images $y_k$ corresponding to pairs $(x_k,y_k)$, and your aproximating straight line $\tilde{y}_k=mx_k+b$. This distance is $(y_k-\tilde{y}_k)^2$.
The square in the functional $S(m,b)$ makes the extremum a unique minimum.
In order to find the minimum of the surface given by the functional $S(m,b)$ one must compute its (2-dimensional) gradient and set it to zero. $$\vec{grad}\,S(m,b)=\vec{0}$$ The last equation provides us 2 equations for 2 unknowns: $m$ and $b$.
These two equations read: $$\frac{\partial S(m,b)}{\partial m}=0$$ $$\frac{\partial S(m,b)}{\partial b}=0$$ Can you keep going from that?