When fitting a polynomial to data points, how to determine the reasonable degree to use?

1.8k Views Asked by At

I have wondered the following: Suppose that there is a set of data points $(x_i,y_i)$. Then I would like to know if it is more reasonable to assume if there is a polynomial relation of degree $m$ between them or of degree $n$. Is there way to measure it? I know that Lagrange's polynomial gives the exact relation but for example physics formula $F=ma$ says that sometimes it is correct to choose linear polynomial to model the phenomenon.

2

There are 2 best solutions below

0
On

Part of the issue is whether you want your function to fit the data "as closely as possible", or if you want it to hit every data point exactly.

For example, if you want to fit some data that appears linear, using linear least squares approximation to find the two coefficients which minimize the error is the right way to go. However, if you want an exact estimate, you might want to look at Lagrange Interpolation.

It sounds like you want a "close as possible fit", but you want to compare the accuracy of Polynomials of different degrees. You can use least squares techniques to find the coefficients of a polynomial of a given degree. To do this, you will use a matrix containing powers of your data points and a vector containing your coefficients.

Say we have d data points, and we want a degree n polynomial. Then our matrix will have d rows and n+1 columns. The ith row contains the powers, 0 through n, of the ith data point. The vector contains the constant, then the linear coefficient, and so on.

Multiplying the matrix and the vector gives you a vector of dimension d. (Independent of the degree of the polynomial used!) Typically we use these objects to minimize the error, but once you have the best coefficients for a given degree, you can multiply the matrix by the coefficient vector, and finally subtract the vector containing the y-values. The norm of this vector (X Powers)*(Coefs) - (Y data) is the square root of the sum of the squares of the error at each data point.

If you find this norm for several different degrees, you can find the degree polynomial with the lowest error, and that should be the closest approximation for the degrees tested.

Best of luck!

0
On

If the criteria of fitting is not specifed on unambiguous mathematical form one cannot definitively answer to your question. For example specifying "as close as possible" is not mathematical.

First, one have to specify the criteria of fitting : Least Mean Square Errors, or Least Mean Absolute Errors, or Least Mean Square Relative Errors or another criteria.

Second, one have to define the maximum acceptable value of error (in above sens) according to the criteria chosen.

On a practical viewpoint in case of polynomials, one can proceed to successive regressions with increassing $n$ until the error (in above sens) becomes equal or lower that the specified acceptable value.