I had a requirement to implement polynomial curve fitting in software, which I have done using multiple regression and Gaussian Elimination with Partial Pivoting.
I have done this for 2nd, 5th and 6th order fits but have found that, for my test data, the 6th order fit is not stable and does not match the fit I get using LINEST in Excel.
For a 6th order polynomial fit $a_0 + a_1x + a_2x^2 + a_3x^3 + a_4x^4 + a_5x^5 + a_6x^6$, I solve the system
$$ \scriptsize \begin{pmatrix} n & \Sigma x_i & \Sigma x_i^2 & \Sigma x_i^3 & \Sigma x_i^4 & \Sigma x_i^5 & \Sigma x_i^6\\ \Sigma x_i & \Sigma x_i^2 & \Sigma x_i^3 & \Sigma x_i^4 & \Sigma x_i^5 & \Sigma x_i^6 & \Sigma x_i^7\\ \Sigma x_i^2 & \Sigma x_i^3 & \Sigma x_i^4 & \Sigma x_i^5 & \Sigma x_i^6 & \Sigma x_i^7 & \Sigma x_i^8\\ \Sigma x_i^3 & \Sigma x_i^4 & \Sigma x_i^5 & \Sigma x_i^6 & \Sigma x_i^7 & \Sigma x_i^8 & \Sigma x_i^9\\ \Sigma x_i^4 & \Sigma x_i^5 & \Sigma x_i^6 & \Sigma x_i^7 & \Sigma x_i^8 & \Sigma x_i^9 & \Sigma x_i^{10}\\ \Sigma x_i^5 & \Sigma x_i^6 & \Sigma x_i^7 & \Sigma x_i^8 & \Sigma x_i^9 & \Sigma x_i^{10} & \Sigma x_i^{11}\\ \Sigma x_i^6 & \Sigma x_i^7 & \Sigma x_i^8 & \Sigma x_i^9 & \Sigma x_i^{10} & \Sigma x_i^{11} & \Sigma x_i^{12}\\ \end{pmatrix} \begin{pmatrix} a_0\\ a_1\\ a_2\\ a_3\\ a_4\\ a_5\\ a_6\\ \end{pmatrix}= \begin{pmatrix} \Sigma y_i\\ \Sigma x_iy_i\\ \Sigma x_i^2y_i\\ \Sigma x_i^3y_i\\ \Sigma x_i^4y_i\\ \Sigma x_i^5y_i\\ \Sigma x_i^6y_i\\ \end{pmatrix} $$ by Gaussian Elimination on the augmented matrix $$ \scriptsize \left[ \begin{array}{ccccccc|c} n & \Sigma x_i & \Sigma x_i^2 & \Sigma x_i^3 & \Sigma x_i^4 & \Sigma x_i^5 & \Sigma x_i^6 & \Sigma y_i\\ \Sigma x_i & \Sigma x_i^2 & \Sigma x_i^3 & \Sigma x_i^4 & \Sigma x_i^5 & \Sigma x_i^6 & \Sigma x_i^7 & \Sigma xy_i\\ \Sigma x_i^2 & \Sigma x_i^3 & \Sigma x_i^4 & \Sigma x_i^5 & \Sigma x_i^6 & \Sigma x_i^7 & \Sigma x_i^8 & \Sigma x^2y_i\\ \Sigma x_i^3 & \Sigma x_i^4 & \Sigma x_i^5 & \Sigma x_i^6 & \Sigma x_i^7 & \Sigma x_i^8 & \Sigma x_i^9 & \Sigma x^3y_i\\ \Sigma x_i^4 & \Sigma x_i^5 & \Sigma x_i^6 & \Sigma x_i^7 & \Sigma x_i^8 & \Sigma x_i^9 & \Sigma x_i^{10} & \Sigma x^4y_i\\ \Sigma x_i^5 & \Sigma x_i^6 & \Sigma x_i^7 & \Sigma x_i^8 & \Sigma x_i^9 & \Sigma x_i^{10} & \Sigma x_i^{11} & \Sigma x^5y_i\\ \Sigma x_i^6 & \Sigma x_i^7 & \Sigma x_i^8 & \Sigma x_i^9 & \Sigma x_i^{10} & \Sigma x_i^{11} & \Sigma x_i^{12} & \Sigma x^6y_i\\ \end{array} \right] $$
To show the problem, my test data is
$$ \scriptsize \begin{array}{ |l| c| } \hline x & y \\ \hline 2.85 & 11.18299371 \\ 2.9 & 10.4794225 \\ 2.95 & 10.4232207 \\ 3 & 10.01520707 \\ 3.05 & 9.630199653 \\ 3.1 & 9.354016384 \\ 3.15 & 8.91347577 \\ 3.2 & 8.187397533 \\ 3.25 & 7.964603263 \\ 3.3 & 7.444917065 \\ 3.35 & 7.264166216 \\ 3.4 & 7.082181813 \\ \hline \end{array} $$
which gives the initial matrix (to 3 d.p for presentation)
$$ \scriptsize \left[ \begin{array}{ccccccc|c} 12 & 37.5 & 117.545 & 369.562 & 1165.375 & 3685.676 & 11690.062 & 107.942\\ 37.5 & 117.545 & 369.562 & 1165.375 & 3685.676 & 11690.062 & 37182.318 & 334.570\\ 117.545 & 369.562 & 1165.375 & 3685.676 & 11690.062 & 37182.318 & 118589.015 & 1040.164\\ 369.562 & 1165.375 & 3685.676 & 11690.062 & 37182.318 & 118589.015 & 379233.370 & 3243.656\\ 1165.375 & 3685.676 & 11690.062 & 37182.318 & 118589.015 & 379233.370 & 1215867.379 & 10145.719\\ 3685.676 & 11690.062 & 37182.318 & 118589.015 & 379233.370 & 1215867.379 & 3907915.779 & 31830.088\\ 11690.062 & 37182.318 & 118589.015 & 379233.370 & 1215867.379 & 3907915.779 & 12590528.440 & 100158.346\\ \end{array} \right] $$
Using LINEST in Excel on the test data gives a fit of
$$35692 + 0x- 53938x^2 + 45571x^3 - 16221^4 + 2733x^5 -180x^6$$
whereas the solution I get through solving the matrix with Gaussian Elimination is
$$140051 - 192699x + 93919x^2 - 14763x^3 - 2414^4 + 1054x^5 - 95x^6$$
which clearly is incredibly different. So, with these results in mind, my questions are:
- Is there something about the test data I'm using that lends itself to numerical instability? If so, are there further techniques I can use (within the same general framework) to improve my result in the general case? It should be noted that the test data is representative of real data in my application (though it is a smaller dataset).
- How does Excel achieve its fit, and can I trust it? If I knew this, I might be able to match their procedure.
- Is there a better way (more accurate and stable) of curve fitting than using Gaussian Elimination, which is a method I can implement in software?