I'm trying to solve the next problem (this is just curiosity):
Let $P_1=(x_1,y_1),\dots,P_n=(x_n,y_n)$ be a set of known different points in $\mathbb{R}^2$ for $n\geq 1$. Find $m$ and $b$ (not necessarily unique) such that the average of the (perpendicular) distances between the line $y=mx+b$ and every point is minimum.
My first attempt was to construct the average of the distances from the line to the points, and i obtained
$$\overline d(m,b)=\sum_{i=1}^n\frac{|mx_i-y_i+b|}{n\sqrt{m^2+1}}$$ The idea was to see this as a regular calculus 2-variables function and use the standard optimization method. But i got stuck when i tried to compute the partial derivatives of such an expression. The main problem with this approach is that to get rid of the absolute value we need to know somehow the relative position of the points to the line, and this is something i want to avoid using as hypothesis.
If $n=1$ it is obvious that any line going through $P_1$ will do the work. For $n=2$ i think any line perpendicular to the segment $P_1P_2$ is a good choice (but I'm still working on that proof). From there on it is a lot more complicated, as expected.
Any ideas on how to approach this problem?
As you wrote, from a formal point of view, the problem seems to be very difficult (not to say more !).
One of the key problems being the initial estimates for $m$ and $b$, what I should try in a first step is to minimize withe respect to $m$ and $b$ $$ f(m,b)=\sum_{i=1}^n\frac{(mx_i-y_i+b)^2}{m^2+1}$$ Computing the partial derivatives and setting them equal to $0$, we can, from $f_b(m,b)$, express $b$ as a linear function of $m$ and then $f_m(m,b)=0$ reduces to a quadratic equation in $m$. The solution I should keep will be the one of the same sign as the $m$ from the linear regression.
From here, hoping that you have a robust optimization algorithm, we can hope to get the minimum of $$\overline d(m,b)=\sum_{i=1}^n\frac{|mx_i-y_i+b|}{\sqrt{m^2+1}}$$
I tried using the following data points $$\left( \begin{array}{cc} x & y \\ 1 & 4 \\ 2 & 8 \\ 3 & 11 \\ 4 & 10 \\ 5 & 13 \end{array} \right)$$ The preliminary step leads to $$m=\frac{1}{25} \left(23+\sqrt{1154}\right)\approx 2.27882 \qquad b=\frac{1}{25} \left(161-3 \sqrt{1154}\right)\approx 2.36353$$ to which would correspond $ \overline d(m,b)\approx 2.31363$.
The second step leads to the final result $m=2.25$, $b=1.75$ to which corresponds $ \overline d(m,b)\approx 2.03069$.