How Are Functions Predicted?

119 Views Asked by At

What process or logical steps do you take to predict a function from any dataset?

I don't want to predict the function using a specific dataset; I want to understand how you predict a function when approaching a new dataset.

For example, if you see [1,3,5,7,9], how would you determine which function to use to capture those data points? Then, how would you predict the function for a completely different dataset, such as [16, 202, 984, 1024, 1111]? What is common when predicting between those two different datasets?

2

There are 2 best solutions below

0
On

To some extent, this is impossible in general. There are an infinite number of valid formulas, and, for any given finite dataset, there are an infinite number of formulas which match them. This is especially true since you haven't defined which operations you are considering to be primitives.

One could provide an inherent ordering of which functions are preferred using the arithmetic hierarchy, with the shortest function defining your dataset being the preferred one. However, finding that shortest function is effectively impossible algorithmically for all but the simplest cases, as it requires the halting problem to determine.

1
On

This answer is about "determining a function" rather than "predicting a function".

For 1-diminsional data there are several methods as shown below. In practice, the general equation would have a form but the parameters are not known exactly. For example you may know that the data fits a linear curve of the form $y=mx$ but you don't know the value of $m$. This is obviously found by solving a simple equation.

For sequences, there is the OESI. This site is very valuable. There are other techniques such as:

0- Clever observation for the relationship between input and output.

1- Least Squares Method

2- Lagrange's Interpolation and other methods

3- Some other methods - 2

4- Curve Fitting

5- An interesting calculator that groups the following methods all in one place ( Linear regression, Quadratic regression, Cubic regression Power regression, ab-Exponential regression, Logarithmic regression Hyperbolic regression, Exponential regression) is Function approximation with regression analysis.

6- See: Stack Exchange Similar Question.

You need to determine which method gives you a function yielding the smallest error. In occasions the function obtained would produce value different from the input values (not for series)! However, if the error is not accepted, you may have to change the method used. When the function obtained by a method is producing high errors, consider using more than 1 method for each set of inputs for example, one could get a line equation of the first 2 input values, then a quadratic equation for the other three values.

Also note that, the set of values could be generated by more than one function!

In you case, using n=0,1,2,3,... the series can be generated from the formula:

$a(n) = 2*n + 1$

See: OESI for specific case.

Its important to also know that the function obtained may not be correct for values not already given. In your example, I assumed that the values obtained are 1,3,5,7,9,11 but maybe the number following 11 in a real world experiment is 11.5! So you need to consider this fact also. That is why not everyone could predict oil prices or stock values since there are external values affecting the simple time and value curves we see all the time.

For the second set, there is no series that can help readily, maybe you can resort to the Least Square Calculator with this input: (0,16), (1,202), (2,984), (3,1024), (4.1111) to get the formula:

$a(n)=380.6 n - 14.4$

Now, is this function good enough? It is not good! Need to find a different method.