Is there any software that can estimate a matching function for a set of plot points?

522 Views Asked by At

I have a set of 2D coordinates that represent a curve, and I'm struggling to find a function that roughly matches them.

Is there any software (free, preferably) that can fairly accurately guess a function that mostly matches a set of points (y = f(x))?

2

There are 2 best solutions below

0
On BEST ANSWER

Assuming that you can approximate the curve with a polynomial and you have a sufficient amount of points, you can fit a curve into it using any software that solves you a system of equations:

say you have points $P = {p_1,p_2, \dots, p_n}$. You construct a $n-1$ degree polynomial $P(x)$, and form $n$ equations in terms of the coefficients of the polynomial. For example, with two points it would look like: $$ a_1x_1+a_2 = y_1 $$ $$ a_1x_2+a_2 = y_2 $$

This is called interpolating, and more specifically, interpolating with a polynomial. The method guarantees that the polynomial will represent the curve in points you specified, but it has no guarantees about how the polynomial represents the curve in the neighborhood of the points. If you know the value of the curve's derivative in $x$, you can use Taylor expansion. link

0
On

Since you said "mostly" fits the points, if you make this formal by assuming you have data that is function values with noise, for some underlying function, then you can do linear regression in any free tool like octave or R, where you estimate a polynomial fit of fairly low degree compared to your number of points, using linear regression to estimate the polynomial coefficients. Regression is fast so you should be able to do it 10 times, in which case you can even figure out the best polynomial degree for your data: Split the data randomly into 10 sets of roughly equal size. Then for each possible degree of polynomial, estimate unbiased the error for the polynomial fit by taking each of the 10 sets separately to be the "test" set that you evaluate sum-of-squared error on, and for each test set, take the other 9 sets and compute the polynomial fit for those 9 sets combined and then evaluate the sum-of-squared error on the left-out set. Whatever polynomial degree gives you the lowest sum-of-squared error (summed over all 10 choices of test set) is the degree of polynomial you should use. This process is known as cross-validation, and it's easy to code up in a language like octave or R that will do the linear regression part for you.

Any smooth function can be approximated by a polynomial over an interval, under mild assumptions, where the approximation gets better as the degree of the polynomial gets higher (as long as you're not over-fitting, hence the cross-validation) so this is a good general approach if you don't know what kind of smooth function to use for your fit.

However, so-called "local polynomial regression" invented by Wasserman is often much more accurate than global polynomial regression. The software package R has this implemented in its procedure loess(). I believe there is no need for cross-validation for that procedure in R. The estimated fit is not given by any simple formula but I believe loess() or a related procedure will give you predicted function values for any input values you want, so for example you can use a very finely uniform spaced grid of input values and plot the fitted function values for them, so you can see what the fitted function looks like.