As far i know, we want the model to include as few regressors as possible because the variance of the prediction $\hat y$ increases as the number of regressor increases.
But from the hald cement data, i get mean square error to be larger in one possible regressor than the two or three or even for the four possible regressors.
And also mean square error for two possible regressors are larger than three possible regressors.
Why?
By "Number of Regressors" do you mean number of datapoints? The Mean Square Error normalizes for the number of datapoints. In that sense, it's a comparable metric regardless of the number of datapoints that you're fitting.
One issue with larger numbers of datapoints is that you may have a harder time fitting the model. For example, if your model is a linear system, suppose you have $A x = b$. If A is square, then there is at least one perfect solution for this problem, and the resulting error is 0. This can be described as "over fitting your data". However, if the system is overdetermined (where $A$ has more rows than columns), then there may not exist a perfect solution, and so the error will be higher than the previous case.