Minimize the number of points in a piecewise linear approximation

2.1k Views Asked by At

I have $m$ data points $(x_i,y_i)$ in a given interval. I would like to find a piecewise linear function $f(x)$ that approximate these $m$ points with a minimum number of points $n$ so that my approximation error is below a tolerance $\epsilon$.

My $m$ points: points

The function $f$ is a piecewise linear function defined with $n$ points $(x_a^{i},y_a^{i})$. For $n=4$, it would look like:

approximation

Approximation error:

$$\frac{1}{m} \sum_{1\le i\le m}(y_i-f(x_i))^2 \leq\epsilon$$

To solve that problem I need to find, for a given $n$, a way to obtain the optimal set of points $(x_a^{i},y_a^{i})$. I can try to minimize my approximation error with gradient descent, but the function is non-convex, so it might not converge to the global optimum.

If I solve the previous step, I can simply simply run the algorithm from $n=1,2,3,...$ and stop when my approximation error drops below $\epsilon$

I sounds like a rather common problem that perhaps already has a solution. Do you know of one, or can you propose an approach to that problem?

2

There are 2 best solutions below

0
On

It isn't simple because the piecewise linear function depends on the break points in a non differentiable way (it is however continuous). And thing get ugly if you varies the number of breaks.

It is much simpler to compute the best approximation for fixed breaks. Thus, a simple heuristic would be following:

  1. Start with the linear best approximation $f_0$ on say $[a,b]$ (that is only two breaks). If the error is sufficiently small, stop.

  2. Otherwise, add a break $c$ (for example in the middle) and compute the best approximation $f_1$. If the error is sufficiently small, stop.

  3. Otherwise, compare the error of $f_1$ on $[a,c]$ and $[c,b]$. Choose the subinterval with the larger error, say $[a,c]$, and add a new break in $[a,c]$. Compute the best approximation $f_2$. If the error is sufficiently small, stop.

  4. Otherwise, ... and so on

0
On

I don't know if it converges to a minimum, but once I made a function to "convert" GPS points into a road.

To do so I took a rectangular region whose side is the double of the tolerance accumulating points as long as the rectangle could contain all of them. At this point I started with another rectangle containing at least the last point of the former rectangle.