How to find an optimum distance between 2 lines?

369 Views Asked by At

In the below graph there are 4 series of points. These points are symmetric respect to $OX$ axis and also with the $OY$ axis.

enter image description here

I have to create/to draw two parallel lines in order to include all these points in between. Then, the distance between these two lines will be the error which I need to compute.

My idea:

  1. Find out the highest point for each position on $OX$ axis.

  2. Find out the highest point from step 1.

  3. Compute the slope from the point found at step 2 to the points from step 1.

  4. Find out the minimum slope

  5. We have 2 points: $A1(x_{1}, y_{1})$ and $B1(x_{2}, y_{2})$ marked with blue circle on my picture. Having these 2 points and knowing that the points are symmetric we can conclude also that the second line, parallel with the first one will pass through $A2(-x_{1}, -y_{1})$ and $B2(-x_{2},-y_{2})$ marked with red.

  6. Now, it can be computed the distance between these 2 lines

BUT there is also another idea better than mine, I suppose.

I compute this error using only 4 points, but every point on the graph has its own weight and importance. So, I am thinking somehow to take into consideration all these points. Maybe it is an optimization/minimization problem.

4

There are 4 best solutions below

0
On BEST ANSWER

$\color{brown}{\textbf{The task standing.}}$

Let there be $\;n\;$ series of points in which the points of each series have the same abscissas and different ordinates. It is required to find a pair of parallel straight lines with the minimum possible distance along the ordinate, between which the points of all series are located.

The given data can be represented in the form of the vectors

  • $x_i,\;i=1,2\dots n$ - the abscissas of the series;
  • $y_i,\;i=1,2\dots n$ - the lowest ordinates in the series;
  • $z_i,\;i=1,2\dots n$ - the highest ordinates in the series.

Proposed algorithm consists of the next steps:

  • calculation of the convex hull;
  • detalization of the optimization task;
  • solving of the optimization task.

$\color{brown}{\textbf{Calculation of the convex hull.}}$

The given vectors $\;x_i, y_i,z_i\;$ allow to define the lower and upper polylines of the given set of points, wherein concave part of these polylines does not influence to the final result. Elimination of the inner vertice from the given table holds the convex hull of the given set of points.

The convex hull vertice $L_k=(\overline x_k,\overline y_k),\;(k=1,2,\dots\overline m)\;$ can be obtained by the next algorithm:

  • $(1)\;k=1,\; i=1$
  • $(2)\;\overline x_k = x_i,\; \overline y_k = y_i;$
  • $(3)\; \text{if } (i==n) \text{ then } \textbf{stop}$
  • $(4)\;k=k+1,\; i=\underset{n\ge j>i}{\text{argmin}}\dfrac{y_j-y_i}{x_j-x_i}$
  • $(5)\;\text{go to } (2)$

I.e. $\;L\;$ is the chain of vertice $\;(x_i,y_i)\;$, where each next vertex provides the lowest slope with the previous one.

The convex hull vertice $H_k=(\hat x_k,\hat y_k),\;(k=1,2,\dots,\hat m)\;$ can be obtained by the next algorithm:

  • $(1)\;k=1,\; i=1$
  • $(2)\;\hat x_k = x_i,\; \hat y_k = z_i;$
  • $(3)\; \text{if } (i==n) \text{ then } \textbf{stop}$
  • $(4)\;k=k+1,\; i=\underset{n\ge j>i}{\text{argmax}}\dfrac{z_j-z_i}{x_j-x_i}$
  • $(5)\;\text{go to } (2)$

I.e. $\;H\;$ is the chain of vertice $\;(x_i,z_i)\;$, where each next vertex provides the highest slope with the previous one.

For example, the data of the table $(1)$ \begin{vmatrix} i & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\ x_i & -7.0 & -6.2 & -4.2 & -2.9 & 0.0 & 2.9 & 4.2 & 6.2 & 7.0 \\ y_i & -4.5 & -3.8 & -3.6 & -2.8 & -1.8 & 1.0 & 0.8 & 1.6 & 2.2 \\ z_i & -2.2 & -1.6 & -0.8 & -1.0 & 1.8 & 2.8 & 3.6 & 3.8 & 4.5\tag1 \end{vmatrix}

can be presented via the vertices of the convex hull in the form of

$$L = \left(\dbinom{-7.0}{-4.5},\dbinom{-4.2}{-3.6},\dbinom{0.0}{-1.8},\dbinom{6.2}{1.6},\dbinom{7.0}{2.2}\right),\tag2$$

$$H = \left(\dbinom{-7.0}{-2.2},\dbinom{-6.2}{-1.6},\dbinom{0.0}{1.8},\dbinom{4.2}{3.6},\dbinom{7.0}{4.5}\right).\tag3$$

Also, the convex hull can be obtained graphically.

Convex hall plot

$\color{brown}{\textbf{Detalization of the optimization task.}}$

The obtained convex hull can be presented in the form of $$y(x)\in[y^\,_L(x),y^\,_H(x)],\tag4$$

where $$y^\,_L(x) = \overline y_k + \overline s_k(x-\overline x_k),\;\text{if}\; x\in[\overline x_k,\overline x_{k+1}];\qquad \overline s_k = \dfrac{\overline y_{k+1}-\overline y_k}{\overline x_{k+1}-\overline x_k};\tag5$$

$$y^\,_H(x) = \hat y_k + \hat s_k(x-\hat x_k),\;\text{if}\; x\in[\hat x_k,\hat x_{k+1}];\qquad \hat s_k = \dfrac{\hat y_{k+1}-\hat y_k}{\hat x_{k+1}-\hat x_k}.\tag6$$

Let $\;s\;$ is the slope of the required parallel lines. Then the equation of the lower line is $$Y_L(s,x) = \overline y^\,_{l(s)} + s(x-\overline x_{l(s)}),\tag7$$ where $$l(s) = \begin{cases} 1,\;\text{if}\;s\in(-\infty,\overline s_1]\\ k+1,\;\text{if}\;s\in[\overline s_k,\overline s_{k+1}]\\ \overline m,\;\text{if}\;s\in[\overline s_m,\infty)\\ \end{cases}\tag{7a}$$ is the number of the lower boundary vertex.

The equation of the higher line is $$Y_H(s,x) = \hat y^\,_{h(s)} + s(x-\hat x_{h(s)}),\tag8$$ where $$h(s) = \begin{cases} \hat m,\;\text{if}\;s\in(-\infty,\hat s_m]\\ k+1,\;\text{if}\;s\in[\hat s_{k+1},\hat s_k]\\ 1,\;\text{if}\;s\in[\hat s_1,\infty)\\ \end{cases}\tag{8a}$$ is the number of the higher boundary vertex.

In the previous example, from $(2),(7a)$ should $$ l(s) = \begin{cases} 1,\;\text{if}\;s\in(-\infty,\frac9{28}]\\ 2,\;\text{if}\;s\in[\frac9{28},\frac37]\\ 3,\;\text{if}\;s\in[\frac37,\frac{17}{31}]\\ 4,\;\text{if}\;s\in[\frac{17}{31},\frac34]\\ 5,\;\text{if}\;s\in[\frac34,\infty) \end{cases}\Rightarrow Y_L(s,x) = \begin{cases} -4.5+s(x+7.0),\;\text{if}\;s\in(-\infty,\frac9{28}]\\ -3.6+s(x+6.2),\;\text{if}\;s\in[\frac9{28},\frac37]\\ -1.8+sx,\;\text{if}\;s\in[\frac37,\frac{17}{31}]\\ 1.6+s(x-6.2),\;\text{if}\;s\in[\frac{17}{31},\frac34]\\ 2.2+s(x-7.0),\;\text{if}\;s\in[\frac34,\infty). \end{cases} $$

From $(3),(8a)$ should $$h(s) = \begin{cases} 5,\;\text{if}\;s\in(-\infty,\frac9{28}]\\ 4,\;\text{if}\;s\in[\frac9{28},\frac37]\\ 3,\;\text{if}\;s\in[\frac37,\frac{17}{31}]\\ 2,\;\text{if}\;s\in[\frac{17}{31},\frac34]\\ 1,\;\text{if}\;s\in[\frac34,\infty)\\ \end{cases}\Rightarrow Y_H(s,x) = \begin{cases} 4.5+s(x-7.0),\;\text{if}\;s\in(-\infty,\frac9{28}]\\ 3.6+s(x-4.2),\;\text{if}\;s\in[\frac9{28},\frac37]\\ 1.8+sx,\;\text{if}\;s\in[\frac37,\frac{17}{31}]\\ -1.6+s(x+6.2),\;\text{if}\;s\in[\frac{17}{31},\frac34]\\ -2.2+s(x+7.0),\;\text{if}\;s\in[\frac34,\infty). \end{cases} $$

And the distance between the boundary parallel lines with the given slope $\;s\;$ equals to

$$D(s) = Y_H(s,x) - Y_L(s,x) = \begin{cases} 9.0-14.0s,\;\text{if}\;s\in(-\infty,\frac9{28}]\\ 7.2-8.4s,\;\text{if}\;s\in[\frac9{28},\frac37]\\ 3.6,\;\text{if}\;s\in[\frac37,\frac{17}{31}]\\ -3.2+6.2s,\;\text{if}\;s\in[\frac{17}{31},\frac34]\\ -4.4+14.0s,\;\text{if}\;s\in[\frac34,\infty).\tag9 \end{cases} $$

$\color{brown}{\textbf{Solving of the optimization task.}}$

Detalized optimization task looks simple.

In particular, from $(9)$ should $$\;\min\limits_{\large s\in(-\infty,\frac9{28}]} D(s) = 9-14\cdot\frac 9{28} = 4.5,$$ $$\;\min\limits_{\large s\in[\frac9{28},\frac37]} D(s) = 7.2-8.4\cdot\frac 37 = 3.6,$$ $$\;\min\limits_{\large s\in[\frac37,\frac{17}{31}]} D(s) = 3.6,$$ $$\;\min\limits_{\large s\in[\frac{17}{31},\frac34]} D(s) = -3.2+6.2\cdot\frac 17{31} = 3.6,$$ $$\;\min\limits_{\large s\in[\frac34,\infty)} D(s) = -4.4+14.0\cdot\frac 34 = 6.1,$$ and $$\mathbf{\min D(s) = 3.6 \;\text{at}\; s\in\left[\frac37,\frac{17}{31}\right]},$$ $$Y_L(s,x) = 1.8-sx,\quad Y_L(s,x) = 1.8+sx.$$

3
On

The two lines can be parameterized as $y=ax+b$ and $y=ax-b$. The distance between the lines is given by $2|b| / \sqrt{a^2+1}$. You are therefore interested in solving \begin{align} \min_{a,b} \quad & \frac{2b}{\sqrt{a^2+1}} \\ \text{s.t.} \quad & ax_i+b \geq y_i \quad i=1,\ldots,n \\ & ax_i-b \leq y_i \quad i=1,\ldots,n \end{align} The constraints ensure that the lines $y=ax+b$ and $y=ax-b$ are above and below the datapoints $(y_i,x_i)_{i=1}^n$, respectively (so you know $|b|=b$). The objective function is not convex in $a$ (and the constraints make it difficult to do a nonlinear reparameterization to make it convex). The only thing working in your favor is that the problem only has three variables. BARON will have no problem solving this to optimality. You could do some preprocessing and for each constraint only include the extreme datapoints (for each $x$ only include the highest point for the first constraint, and the lowest point for the second constraint).

2
On

You have two decision variables: $a$ represents the common slope, and $b$ represents the $y$-intercept of the upper line. Instead of minimizing the distance between the lines $y=ax+b$ and $y=ax-b$, you can minimize the sum of weighted distances (weight $w_i$) from each point $i$ to the closer line. The problem is to minimize $$\sum_i w_i \left(\min(a x_i + b - y_i, y_i - (a x_i - b))\right)^2$$ subject to linear constraints \begin{align} a x_i + b &\ge y_i &\text{for all $i$}\\ a x_i - b &\le y_i &\text{for all $i$} \end{align}

0
On

One thing is to find the minimum band between two parallel lines that encompasses all the points, as you state at the beginning.
In this case, as you said, only the extremal points will be of importance and all others are not considered.

In this case your algorithm is quite good, considering that the values are anti-symmetric. and I do not see that there might be a much better one.

Another thing is what you say at the end, that you would like to consider the contribution of all the points by establishing which linear tendency they have, and how much they depart (or obey) to that.

That is exactly the subject of Linear regression.

Since your data are anti-symmetric, the barycenter (average $x$, average $y$) of the cloud of points will be at the origin and the linear tendency will reduce to a $y = mx$. The problem then is to determine $m$ and relevant confidence interval for it and for the intercept $b$ around $b=0$.

But for a statiscally significant approach you shall first establish some Assumptions, based on the knowledge of the "physical" system that generates the data.

Prior to fixing the most appropriate assumptions it is not possible to answer to your question.
In the simplest case you will be led to adopt the Least Squares method,