Measure of goodness of curve fit

44 Views Asked by At

I am trying to fit a cubic polynomial to a set of data which contains outliers. To find if the data has any outliers, I compute the residual error of the curve fit and if the residual error is less than a threshold, I assume that the fit was good and that there is not outlier. It is assumed that the data is generated from a cubic polynomial. So there is no need to change the order of the polynomial being fit. Curve fit The attached figure shows one of the snapshots where I am fitting a cubic polynomial( dashed red line) to set of data points(red points). There is clearly an outlier in the right bottom corner which I need to reject. And I know that we can use Ransac method to achieve that. However, my question is about the method computing the residual error. I computed the root mean squared error for this fit and it was 0.0293 which was within my acceptance tolerance. But I know that this is a bad fit.

Is there any other way to calculate the goodness of fit?

RMSE = $\frac{\sqrt{\sum\limits_{i=0}^N\left(y_i - \hat{y}_i\right)^2}}{N}$

where, $y_i$ is the true value at $x_i$ and $\hat{y}_i = a_0 + a_1x_i + a_2x_i^2 + a_3x_i^3$ is the value obtained from curve fit. $a_0, a_1,a_2,a_3$ are the coefficients of the curve.

One way is to compute the error along the normal to the curve i.e. to compute the error at $x_i$, I can compute the tangent to the curve at $x_i$ and then $\hat{y}_i$ can be the distance of the point $y_i$ from the tangent. Do you think that it makes sense?