I was following a linear algebra course, and I came upon an example where a linear regression was done by solving $A^{\rm T}Ax = A^{\rm T}b$, where $Ax = b$ could not be solved because $b$ is not in the column space of $A$. So the formula $A^{\rm T}Ax = A^{\rm T}b$ is derived as the projection of $b$ onto the column space of $A$, and solved for as an approximation.
What I am confused about is where during this derivation this became a least squared approximation. How would I formulate the problem in a similar way (as a linear algebra problem), but minimize linear distance instead? (edited, from Euclidean distance which was improper use of terminology)
What I mean to ask is when fitting a line to a set of points, if I wanted to minimize the absolute error, not the squared error, how would I formulate the problem?
Given an approximate solution $x$ we can define the squared error in the approximation as $E(x) = \| A x - b \|^2 = (Ax-b)^T(Ax-b)$.
Finding the $x$ that minimises this "error" us the least squares solution.
$E$ has gradient $\nabla E(x) = 2 A^T(Ax-b)$. Also $E$ has a minimum when $\nabla E = 0$. i.e. when $A^TAx-A^Tb = 0$ - which is exactly the condition you want.