In stack overflow there are at least two questions asking how to find the straight line in a 2D space nearest to a given set of points $\vec{a}_i$ for $i = 1...N$: question 1 question 2.
In both cases the answer has been use a linear regression. That surprises to me, not only because linear regression doesn't covers the case of a vertical line, but because the measure of error is the "y" distance, not the euclidean distance from point to line. I've decided to check if the answer is the same.
The line will be defined by the vector $\vec{r}$ from origin to the nearest point of line. In polar coordinates lets say $\vec{r}=r \hat{\theta}$.
Error is then defined by the sum for all points of the individual quadratic error $\epsilon_i^2=(\vec{a}_i\hat{\theta}-r)^2$
Partial derivative on $r$ gives $r$ must be the average for all $i$ of $\vec{a}_i\hat{\theta}$.
Partial derivative on $\theta$ gives $ \frac{\partial}{\partial{\theta}}\epsilon_i^2= 2 \epsilon_i \vec{a}_i \frac{\partial \hat{\theta}}{\partial\theta}=2 (\vec{a}_i\hat{\theta}-r) \vec{a}_i \frac{\partial \hat{\theta}}{\partial\theta}$.
where $\frac{\partial \hat{\theta}}{\partial\theta}$ is a normal vector in the direction of the line, orthogonal to $\hat{\theta}$.
I do not know how I can simplify second equation till reach a solution expression that can be applied without need to solve the system using numerical methods (somethign I need for a software application about drawings).
As consequence, I can not answer myself the original question about if linear regression gives the correct optimal result or how far it is from the correct one. By example, if the solution is a vertical line or near than vertical line, is the linear regression result accurate or far from the optimal ?.
What you are after is called total least squares.
Let the implicit equation of the line be
$$x\cos\theta+y\sin\theta+p=0$$
and you want to minimize
$$\sum(x\cos\theta+y\sin\theta+p)^2.$$
The optimal solution can be show to be centered on the centroid (by linearity) and the equation can be rewritten
$$(x-\overline x)\cos\theta+(y-\overline y)\sin\theta=0.$$
Now, assuming that the data has been centered, the minimizer of the least squares equation writes
$$0=2\sum(x\cos\theta+y\sin\theta)(-x\sin\theta+y\cos\theta)\\=(\cos^2\theta-\sin^2\theta)\sum2xy-2\sin\theta\cos\theta\sum(x^2-y^2).$$
Using the double angle formulas, you obtain the value of $\tan2\theta$.
Note that the last equation leaves a $\dfrac{k\pi}2$ undeterminacy on $\theta$, because there are two minima and two maxima over a full turn.