When does Mean Square Error increase?

1.8k Views Asked by At

As far i know, we want the model to include as few regressors as possible because the variance of the prediction $\hat y$ increases as the number of regressor increases.

But from the hald cement data, i get mean square error to be larger in one possible regressor than the two or three or even for the four possible regressors.

And also mean square error for two possible regressors are larger than three possible regressors.

Why?

2

There are 2 best solutions below

1
On

By "Number of Regressors" do you mean number of datapoints? The Mean Square Error normalizes for the number of datapoints. In that sense, it's a comparable metric regardless of the number of datapoints that you're fitting.

One issue with larger numbers of datapoints is that you may have a harder time fitting the model. For example, if your model is a linear system, suppose you have $A x = b$. If A is square, then there is at least one perfect solution for this problem, and the resulting error is 0. This can be described as "over fitting your data". However, if the system is overdetermined (where $A$ has more rows than columns), then there may not exist a perfect solution, and so the error will be higher than the previous case.

0
On

Now that I understand your statement better, I think I can better answer this question.

Is your model with one regressor a subset of your model with multiple regressors? E.g. are you using polynomials where the regressors are the coefficients? If so, then either your doing your fit wrong. Or, your fitting is not determining the best values.

Let's think of an example. Suppose you're using a polynomial of order 1 as your model. Then you have two coefficients that you can vary to determine a fit. What if you use instead a polynomial of order 2? Then you have three coefficients that you can vary to determine a fit. But you can always set the third coefficient to 0 and get exactly the fit you got in the order 1 case. So you must be able to do better (or at least as well) with an order 2 polynomial than with an order 3 polynomial.

So either (A) you have a bug in your code and you're not determining the regressor values correctly. Or (B) your procedure does not guarantee that you'll get the best regressor values.

Note, if the relationship between regressors $x$ and output $b$ is a linear system, (like $A x = b$ then you can always determine $x$ by using the pseudo inverse of $A$ as follows: $x = \text{pseudoInvert}(A) b$. In this case, you are guaranteed to get the best values of $x$ in a mean square error sense. If this is your situation, then you must have a bug in your code.