Program to find closest function to fit arbitrary data

4.6k Views Asked by At

I've wanted this for years, but have never come across anything; a program for Windows to find the closest function to fit arbitrary data.

The data I feed it is simple: A table with two columns comprising an initial number on the left side, and the converted number on the right.

This below is a very simple example. The program should easily be able to see the pattern:

  • 10 -> 110
  • 20 -> 410
  • 30 -> 910
  • 40 -> 1610

So there, the program would find the formula to be x^2+10.

More complicated may be something like this:

  • 5 -> 16
  • 7 -> 23
  • 15 -> 250
  • 25 -> 300
  • 35 -> 30000

I made those numbers up. Before anyone brings it up, I know there are an infinite number of functions which will successfully work with the above data (if I didn't say that, I just know someone would jump on my toes for it). However, I want a program, perhaps using genetic programming or advanced B-spline/NURB curve type math to find an accurate (doesn't have to be a perfect) fit, and if at all possible, a small and simple function. The winning function can use addition, multiplication, exponentiation, double exponentiation, or basically anything it fancies to find the function.

If the program uses genetic programming, then the GP's fitness function will take into account the simplicity/mathematical steps as a component of overall fitness.

Ideally, the program would allow two or more inputs (rather than one) for each output, but I'd be grateful for something which just works with a single input and output as demonstrated in the two examples above.

The program should be so simple that even a child could use it.

4

There are 4 best solutions below

3
On

Use Excel. Make a chart from your values, and then use the "Add Trendline" function. It allows you to choose polynomial, logarithmic, exponential formulae to do the fitting. By adjusting the parameters, you can make trade-offs between the simplicity of the formula and the goodness of fit.

If you choose the option to "Display Equation on Chart", you get to see what formula was used for fitting. For your first set of numbers, if you choose polynomial fitting, then Excel gives you the formula $x^2 + 10$, as you would expect (almost, anyway).

You said that the program should be "so simple that even a child could use it". Children use Excel (my children do, anyway).

2
On

Polynomial fitting comes to mind for me as well (per @bubba). A genetic algorithm approach could be something like this:

Define a chromosome as $[P_1,exp(P_2),F(P_3)]$ where $P_i$ represents a fitted polynomial, $E$ an exponential function, and $T$ a finite trigonometric series.

The Fitness function could be the Akaike Information Criterion or Bayesian Information Criterion using a normal likelihood function with $\sigma=$ standard deviation of the residuals.

Basically, your program would create "vectors" of coefficients for the three classes of models above (including a zero-coefficent] and then fit the model to the data, estiamte residual standard devition, calculate likelihood and then the AIC or BIC. Then, mutate, populate, and iterate.

2
On

I once had a testversion of "graphprism" and learned that it just did what you are asking for: trying various models, from simple to complex, to fit data. The functions to be involved could be configured (polynomial, exponential, trigonometric,..., mixed...) as well as something like the "difficulty degree" - it was really impressing when the fitting process tried various models and the list of possible formulae for approximations grew slowly while the software was analyzing and re-analyzing. Something similar, if I recall right (but can miss here), with "eureqa", but I don't remember really its concept.

It's long time I used it and I'm not sure, whether and if under which name it has be continued. Here are two links for the likely relevant software,
http://www.nutonian.com/products/eureqa/
http://www.graphpad.com/guides/prism/6/curve-fitting/
but these are both commercial products now; the latter seems to offer a 30-day trial version. If it is really that program that I had that time there should still be a very nice, very involved and still comprehensible manual with it.

0
On

This is hard. It depends a lot on what you have experience with. Suppose I gave you $$1 \to 31\\2 \to 52\\3 \to 86\\ 4 \to 141\\5 \to 230$$ Would you recognize the outputs as three less than Fibonacci numbers? In this case you could take the log of the output and find it is not far from a straight line. Douglas Hofstadter has worked in this area for decades. There are two stages: find function(s) that reproduce the input data, then (if you have more than one) find the simplest. The simplest can be in the eye of the beholder.