Curve fitting on small and dense dataset as straight as possible

128 Views Asked by At

I have to fit a 3rd degree polynomial on a small and dense dataset. I've used polynomial regression and least square error, but the result is undesirable. I will explain why.

Let the polynomial be: $$y = c_0+c_1x+\frac{1}{2}c_2x^2+\frac{1}{6}c_3x^3$$

And the dataset: $$x = [27, 32, 33, 34, 35, 38, 39, 40, 41, 42, 43]$$ $$y = [29, 29, 29, 29, 29, 29, 30, 30, 29, 29, 29]$$

By applying polynomial regression, I get

I understand that this is absolutely correct. It fits on my data. But the desired behavior on this kind of points is as follows (made by hand)

Also, if the points describe a curvature, the resulting polynomial should follow that curvature.

What I can't do to get to this result

  • Modify input dataset
  • Modify ecuation

What I can do

  • Modify anything about polynomial regression (like error function)
  • Replace polynomial regression

I've also tried Levenberg-Marquardt, but as expected the results are the same as polynomial regression.

2

There are 2 best solutions below

2
On

Besides the trivial solution to simply use an affine model instead of a cubic, if you switch from a quadratic residual to absolute value, you will obtain a affine response.

With absolute values, called $\ell_1$, the solution is typically less sensitive to outliers, and in your case the two values of $30$ will effectively be seen as outliers and the $\ell_1$-optimal solution is $y(x) = 29$.

The $\ell_1$ model can be cast as a linear program by noting that minimizing $\sum |e_i|$ can be written as minimize $\sum t_i$ subject to $-t_i \leq e_i \leq t_i$.

0
On

For sure, the cubic fits your data as shown below $$\left( \begin{array}{ccc} x & y & \text{calc}\\ 27 & 29 & 29.030 \\ 32 & 29 & 28.832 \\ 33 & 29 & 28.938 \\ 34 & 29 & 29.063 \\ 35 & 29 & 29.193 \\ 38 & 29 & 29.490 \\ 39 & 30 & 29.514 \\ 40 & 30 & 29.480 \\ 41 & 29 & 29.375 \\ 42 & 29 & 29.186 \\ 43 & 29 & 28.899 \end{array} \right)$$ The problem is that, for the parameters, one can obtain $$\begin{array}{clclclclc} \text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\ c_0 & 112.129 & 52.7599 & \{-16.9699,241.228\} \\ c_1 & -7.49163 & 4.64423 & \{-18.8557,3.87239\} \\ c_2 & 0.442603 & 0.269298 & \{-0.216346,1.10155\} \\ c_3 & -0.0128499 & 0.00771852 & \{-0.0317365,0.00603661\} \\ \end{array}$$ and none of them is significant.