Linear regression: residual values do not have zero mean

223 Views Asked by At

I performed a linear regression model y~x in R by running

lr <- lm (y~x, data = train)
p <- predict (lr, newdata = test)
error <- p - test$y

It seems that error does not have zero mean (in fact, mean(error) = -5), it is right-skewed and I could not say it belongs to any kind of distribution I know.

Furthermore, I tried to replace lm with rlm(MASS), but the RMSE and MAE of the prediction does not increase.

What is some alternative approaches to explore in this situation? I am thinking of a problem that error term (\epsilon) is not independence from x, but not sure how to model that?

Thanks