Does overfitting occur when we don't use a polynomial hypothesis in machine learning algorithms?

58 Views Asked by At

Let's say we have a linear regression algorithm in which the hypothesis looks like this:

$h(x) = a_0 + a_1 \times x_1 + a_2 \times x_1^2 + a_3 \times x_1^3 + a_4 \times x_1^4$

This will definitely cause the problem of overfitting when training it on our dataset with only one feature. However, overfitting occurs also when we have a lot of features. I can't seem to wrap my head around how that happens. Let's say we have a dataset with 4 features and the hypothesis looks like this:

$h(x) = a_0 + a_1 \times x_1 + a_2 \times x_2 + a_3 \times x_3 + a_4 \times x_4$

How can overfitting happen with this? Doesn't this just produce the equivalent of a line in 5 dimensions? Thank you in advance!

1

There are 1 best solutions below

4
On BEST ANSWER

It seems like you've understood the "wiggly polynomial" example of overfitting and taken a slightly wrong lesson from it. The reason that polynomials with many degrees of freedom are "bad" has nothing to do with the fact that they have twists and turns. In fact, if the generating process for the data data has twists and turns, a model without them will be biased.

The issue is that they (may) have too many degrees of freedom. In this case, during fitting process, the overly complicated model will fit the data very well... but the data is noisy and so most of the "twists and turns" it picks up on are just noise. It's like a sports analyst digging for obscure statistics (14 out of the last 15 games have been won by the team with the oldest average age of assistant coach...) , and then using those statistics to make very complicated and precise predictions. Of course they will be all over the place, taking apparent patterns from the past that actually were just random noise and then assuming they will continue.

So this has nothing in particular to do with the nonlinear shape of polynomials, just the number of parameters in the model (especially if the features don't actually have any relevance to what's being predicted).

(For this reason, I'm not a big fan of the 'wiggly polynomial' example. Also the most striking thing about polynomials is how bad the extrapolation can be. It is easy to conflate this with overfitting, but it really is a combination of things.)