Why use regularization to reduce over-fitting

403 Views Asked by At

I'm having trouble understanding why should we use regularization for over-fitting when we can simply reduce the number of order to our polynomial function? Is it because it saves us time from having to come up with a polynomial function of lower order? For linear regression most of the work in figuring out a fit comes from figuring out our coefficients b0, b1, etc which we can simply find with a closed form equation(sometimes known as the normal equations). If we use regularization we have to come up with a lambda that makes sense. Please give me some example or insight on the benefits of using regularization.

1

There are 1 best solutions below

0
On

It is true that polynomial regression makes it far easier to overfit than OLS linear regression, but in certain settings even OLS linear regression can overfit. Suppose that we have 2 observations and 1 variable. Two points determine a line so we will have a perfect fit with all 0 residuals, but we will be overfitting this data and performance will be relatively poor on an independent test set. The plot below shows this.

overfitting plot

The red line is fit only to the two black points and perfectly fits them. The blue line is fit to all of the data. The blue line has non-zero residuals, whereas the red line does not (meaning that the blue line has non-zero training MSE whereas the red line has 0 training MSE) but by looking at the graph we can see that the MSE of the red line with respect to the other points is much larger than the MSE of the blue line. Remember that in practice we would compare these two models via either the cross-validated MSE or the MSE evaluated on an independent holdout set.

The moral of the story is that as the dimensionality of the data increases, so does the ease of overfitting. This is why we sometimes want models that are even less flexible than OLS linear regression, which is where things like ridge regression and the lasso come in. Because they have the regularization parameter there are fewer models possible, meaning that the flexibility of the fit is reduced. This in turn reduces the variance of the fit, and although the bias may be increased, the net result can often be a smaller test MSE.