The real problem is about bus, I have some bus GPS data(which I see it as different points), and I know the bus route (which is defined by station points and a series of points between two stations), now for the GPS data, I want to determine the point is between which two adjacent stations. The bus GPS data is collected from bus in its way, the location of data is likely not in the curve of the bus route, instead the location is "near" the curve.
I try to abstract the problem into math description,
let S be stations, S1 means the first station of the bus line, S2 means the second
let p be the points between two stations
Problem description is:
Given S and p, how to find i where a new point P is between Si and S(i+1)
For example: picture example
In this picture(blue points mean S, red points p orange points P), result should be that point P is between P1 and P2
What I have thought about:
Method 1: for lines defined by Si, Si+1, let f1=dist(P, Si)+dist(P,Si+1), the answer is the line which has min(f1)
Method 2: for lines defined by Si, Si+1, let dist be the distance between point P to line defined by two points(Si, Si+1), f2 = dist(P,Si,Si+1), the answer is the line which has min(f2)
Method 3: for lines defined by Si, Si+1, p1, p2...pn, let f3=(dist(P,Si)+dist(P,Si+1)+(∑(i=1 to n)(dist(P,pi)))/(n+2),the answer is the line which has min(f3)
But these three methods can't solve the problem at all. Any ideas to solve this problem or any advice to describe the problem in a more brief way are appreciated.(I try hard to make my problem clear, but seem to fail)
Let $P_i = ( x_i , y_i )$ for $i = 1 .. N$ be the vertices along the polyline formed by the bus route, and $P = ( x, y )$ be the current location of the bus.
One approach is to pick the polyline segment $\overline{P_i P_{i+1}}$ closest to $P$. This involves finding the distance between a point and a line segment, not the distance between a point and a line. (Remember, the latter extends to infinity at both ends.)
I would use the following procedure to find out the distance between $P$ and each line segment:
Find the unit vector $U$ from $P_i$ towards $P_{i+1}$, and their distance $d$: $$d = \left\lVert P_{i+1} - P_i \right\rVert = \sqrt{ (x_{i+1} - x_i)^2 + (y_{i+1} - y_i)^2} \tag{1}\label{NA1}$$ $$U = \frac{ P_{i+1} - P_i }{d} = ( x_u , y_u ), \quad \left \lbrace \begin{aligned} x_u &= \frac{x_{i+1} - x_i}{d} \\ y_u &= \frac{y_{i+1} - y_i}{d} \end{aligned} \right . \tag{2}\label{NA2}$$
Parametrise the line as $$L(t) = P_i + t U = ( x_L , y_L )$$ $$\left\lbrace \begin{aligned} x_L(t) &= x_i + t x_u \\ y_L(t) &= y_i + t y_u \end{aligned} \right .$$ The important point is to note that $L(0) = P_i$, and $L(d) = P_{i+1}$; i.e. $0 \le t \le d$, for $L(t)$ to be a point on the line segment.
Find the point $L(t)$ closest to $P$.
When the line through $L(t)$ and $P$ is perpendicular to the unit vector $U$ (or any line through $P_i$ and $P_{i+1}$), then $L(t)$ is the closest point to $P$ on the line, and their dot product is zero: $$\left ( L(t) - P \right ) \cdot U = 0$$ i.e. $$( x_L(t) - x ) x_u + ( y_L(t) - y ) y_u = 0$$ and substituting $x_L(t)$ and $y_L(t)$ we get $$( x_i + t x_u - x ) x_u + ( y_i + t y_u - y ) y_u = 0$$ and solving for t, noting that $x_u^2 + y_u^2 = 1$, we get $$t = (x - x_i) x_u + (y - y_i) y_u \tag{3}\label{NA3}$$
Calculate the distance $r$ at that point.
We can use either $$r = \sqrt{ (x_i + t x_u - x)^2 + (y_i + t y_u - y)^2 } \tag{4a}\label{NA4a}$$ or $$r = \left\lvert x_u ( y - y_i ) - y_u ( x - x_i ) \right\rvert \tag{4b}\label{NA4b}$$ as they are equivalent (when $\eqref{NA3}$ is true and $x_u^2 + y_u^2 = 1$). I recommend using the latter in computer programs, as taking the absolute value is much faster than taking a square root.
Determine the distance $R$ between point $P$ and line segment $\overline{P_i P_{i+1}}$: $$R = \begin{cases} r, & 0 \le t \le d \\ \sqrt{(x_i - x)^2 + (y_i - y)^2}, & t < 0 \\ \sqrt{(x_{i+1} - x)^2 + (y_{i+1} - y)^2}, & t > d \end{cases} \tag{5}\label{NA5}$$ The first case occurs when $L(t)$ is on the line segment, so we use the distance from $P$ to the line. The second case occurs when $L(t)$ is on the line, before $P_i$. The third case occurs when $L(t)$ is on the line, after $P_{i+1}$.
Point $P$ is "on" the polyline segment that minimizes $R$, so you need to calculate it for each line segment, and pick the segment with smallest $R$.
Note that the above method may not be sufficient when the bus $P$ is close to some station $P_k$, because $P$ may be closer to the station than to the line from that station to the next. So, in the case where station $P_k$ is closest to $P$, you cannot tell whether $P$ is between $P_{k-1}$ and $P_k$, or between $P_k$ and $P_{k+1}$. You'd need to remember bus state, whether the last few measurements indicated distance to $P_k$ was increasing or decreasing.