How estimate straight line using linear regression with some outliers

88 Views Asked by At

I have a data set

    x           y
--------------------
505.000000 77.000000
507.000000 88.000000
509.000000 99.000000
509.000000 110.000000
511.000000 121.000000
511.000000 132.000000
511.000000 143.000000
513.000000 154.000000
513.000000 165.000000
513.000000 176.000000
515.000000 187.000000
515.000000 198.000000
517.000000 209.000000
517.000000 220.000000
517.000000 231.000000
519.000000 242.000000
519.000000 253.000000
519.000000 264.000000
521.000000 275.000000
521.000000 286.000000
523.000000 297.000000
523.000000 308.000000
523.000000 319.000000
525.000000 330.000000
525.000000 341.000000
507.000000 352.000000
475.000000 363.000000
443.000000 374.000000
411.000000 385.000000

I want to fit it in fitting function y=ax+b.

Using Liner Regression I got result as

image of result

But I want to get like

expected result

As there have significant amount of outliers, what method would be best for this situation. I googled and found Theil–Sen estimator is good for this purpose. But I have not found any good tutorial or code.

So, how I remove this outliers for liner regression or other method that fit with my purpose and where I can learn it.

2

There are 2 best solutions below

0
On

You can just remove the outliers from your data set manually, and then run the regression on the remaining data. I don't have enough rep to comment Otherwise, you're pretty much stuck with the result you first obtained.

0
On

The best approach for linear regression when there are some outliers expected and you don't want them to contribute to the error is called RANSAC. It is super easy to implement, and it is designed precisely for such a use case, when you believe there are some erroneous data points which you want to leave out while doing regression.

If you want to understand it properly, watch Mubarak Shah's video tutorial on RANSAC. It is available for free on youtube.

If you want to implement it in MATLAB - http://in.mathworks.com/discovery/ransac.html