if we are given a set of points $\{(x_i,y_i)\}$ and we are looking to fit a straight line $ax+b$ "as close" to the points as possible we are building a set of equations.
One way is to take the partial derivatives with respect to $a,b$ and find the min.
On the matrix form we multiply by $A^T$ when $A$ is entries of $1,x,x^2,..$ why is it smilier to finding the min using derivative?
There is some cookbook about matrix and vector calculus on the internet where we can derive $${\bf x_o}=\min_{\bf x}\{\|{\bf Ax-b}\|_2^2\}$$ by
Doing these steps is a good exercise.