I have $m$ data points $(x_i,y_i)$ in a given interval. I would like to find a piecewise linear function $f(x)$ that approximate these $m$ points with a minimum number of points $n$ so that my approximation error is below a tolerance $\epsilon$.
The function $f$ is a piecewise linear function defined with $n$ points $(x_a^{i},y_a^{i})$. For $n=4$, it would look like:
Approximation error:
$$\frac{1}{m} \sum_{1\le i\le m}(y_i-f(x_i))^2 \leq\epsilon$$
To solve that problem I need to find, for a given $n$, a way to obtain the optimal set of points $(x_a^{i},y_a^{i})$. I can try to minimize my approximation error with gradient descent, but the function is non-convex, so it might not converge to the global optimum.
If I solve the previous step, I can simply simply run the algorithm from $n=1,2,3,...$ and stop when my approximation error drops below $\epsilon$
I sounds like a rather common problem that perhaps already has a solution. Do you know of one, or can you propose an approach to that problem?


It isn't simple because the piecewise linear function depends on the break points in a non differentiable way (it is however continuous). And thing get ugly if you varies the number of breaks.
It is much simpler to compute the best approximation for fixed breaks. Thus, a simple heuristic would be following:
Start with the linear best approximation $f_0$ on say $[a,b]$ (that is only two breaks). If the error is sufficiently small, stop.
Otherwise, add a break $c$ (for example in the middle) and compute the best approximation $f_1$. If the error is sufficiently small, stop.
Otherwise, compare the error of $f_1$ on $[a,c]$ and $[c,b]$. Choose the subinterval with the larger error, say $[a,c]$, and add a new break in $[a,c]$. Compute the best approximation $f_2$. If the error is sufficiently small, stop.
Otherwise, ... and so on