Minimize the sum of distances to a line

Question

Minimize the sum of distances to a line

1.4k Views Asked by Bumbble Comm At 11 May 2026 - 4:17

Given $n$ points in $\mathbb{R}^2$ (I think there won't exist any differences for $\mathbb{R}^n$) find the parameters $A$ and $v$ such that the sum of distances from the points to the line $A+tv$ is minimum.

I know that is a classic problem: find such a line that minimizes the sum of the squares of the distances, but for this problem I couldn't find a solution.

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2018-05-31 17:29:08

Given the points $\{(t_i,y_i)\}$, your cost function (sum of squares of distances) would be $$C=\sum_{i=1}^n\frac{(y_i-vt_i-A)^2}{1+v^2}=\frac{1}{1+v^2}\sum_{i=1}^n(y_i-vt_i-A)^2.$$ For reference, see the wiki article on the Distance from a point to a line, and using the equation $y=vt+A$, rearranged to $y-vt-A=0$. Now this is non-linear in the coefficients ($v$, at least), which is different from the usual linear least-squares regression where you minimize vertical distances. Let's calculate the partial derivatives: \begin{align*} \frac{\partial C}{\partial A}&=-\frac{2}{1+v^2}\sum_{i=1}^n(y_i-vt_i-A), \\ \frac{\partial C}{\partial v}&=\frac{(1+v^2)(-2)\sum_{i=1}^n(t_i)(y_i-vt_i-A)-(2v)\sum_{i=1}^n(y_i-vt_i-A)^2}{(1+v^2)^2}. \end{align*} Setting these equal to zero produces a few simplifications: \begin{align*} 0&=\frac{\partial C}{\partial A}=-\frac{2}{1+v^2}\sum_{i=1}^n(y_i-vt_i-A) \\ 0&=\frac{\partial C}{\partial v}=-2\,\frac{(1+v^2)\sum_{i=1}^n[t_i(y_i-vt_i-A)]+v\sum_{i=1}^n(y_i-vt_i-A)^2}{(1+v^2)^2} \\ \\ 0&=\sum_{i=1}^n(y_i-vt_i-A) \\ 0&=(1+v^2)\sum_{i=1}^n[t_i(y_i-vt_i-A)]+v\sum_{i=1}^n(y_i-vt_i-A)^2. \end{align*} Then you want to solve this system using gradient descent, or stochastic gradient descent.

It's highly doubtful you could get an analytic solution similar to the normal equations for linear regression. However, the first equation is linear, and it's straight-forward to solve for $A$: \begin{align*} 0&=\sum_{i=1}^n(y_i-vt_i-A) \\ 0&=\sum_{i=1}^n(y_i-vt_i)-nA \\ nA&=\sum_{i=1}^n(y_i-vt_i) \\ A&=\frac1n \sum_{i=1}^n(y_i-vt_i). \end{align*} With a view towards plugging into the other equation, we would need to change the summation variable: $$A=\frac1n \sum_{j=1}^n(y_j-vt_j). $$ Plugging into the other equation would yield $$0=(1+v^2)\sum_{i=1}^n\left[t_i\left(y_i-vt_i-\frac1n \sum_{j=1}^n(y_j-vt_j)\right)\right]+v\sum_{i=1}^n\left[y_i-vt_i-\frac1n \sum_{j=1}^n(y_j-vt_j)\right]^{\!2}. $$ You or may not have gained anything by the substitution.

**Bumbble Comm** · Answer 2 · 2019-12-28 15:43:25

Here's an algorithm, rather than a closed-form solution (cf. finding the median of a set of numbers): Try all lines going through pairs of input points (it's enough to try only the halving lines).

Why it works: W.l.o.g. the optimal line goes through an input point (because otherwise you can shift the line parallel to itself without increasing the objective function), so consider every input point as the origin through which the line goes. Fix also the line's "combinatorial type", i.e., the sets $U$ and $L$ of the input points that are above and below the line resp. That is, consider the line'a rotation about the origin, between hitting a point from above and from below. The optimum is attained either at the extreme rotation (i.e., when the line passes through two input points) or in the middle -- and the claim is that the latter is not the case (in any non-trivial instance where the input points are not all collinear). Indeed, the distance from $(x_i,y_i)$ to the line $x\cos\phi+y\sin\phi=0$ (going through the origin) is $\pm( x_i\cos\phi+y_i\sin\phi)$ (depending on whether the point is above or below the line), so the total distance from all the points is

$$\sum_U(x_i\cos\phi+y_i\sin\phi) - \sum_L(x_i\cos\phi+y_i\sin\phi)=\Delta x\cos\phi+\Delta y\sin\phi$$ where $\Delta x = \sum_U x_i - \sum_L x_i, \Delta y= \sum_U y_i - \sum_L y_i$. But the weighted sum of cosine and sine can be made 0 (by the direction perpendicular to the vector of weights), hence if the minimum is attained at a non-extreme angle, the minimum is 0.

Minimize the sum of distances to a line

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in REGRESSION

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions