Cross-Validation

219 Views Asked by Bumbble Comm At 29 Mar 2026 - 6:27

Does anyone understand the paragraph below?

The paragraph comes from Cross-valiation explanation at wikipedia.

"It can be shown under mild assumptions that the expected value of the MSE for the training set is (n − p − 1)/(n + p + 1) < 1 times the expected value of the MSE for the validation set (the expected value is taken over the distribution of training sets)."

Thanks in advance.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 17 May 2014 - 5:50 BEST ANSWER

When you fit a model on a training set, a simple measure of model fit is MSE, which is basically the average squared distance between the observed y values and the predicted y values from your model. It turns out that MSE is an optimistic measure of the predictive accuracy of your model if you were apply the model to a brand new dataset on which the model was not trained. The general purpose of cross-validation is to try to estimate the accuracy of your model when applied to a new dataset.

However, in the case of linear regression, it turns out that the amount by which the MSE from the training dataset underestimates the predictive accuracy on a new data set can be estimated to be

$\dfrac{n-p-1}{n+p+1}$ where n is the number of observations in the training dataset and p is the number of parameters estimated. Thus it is estimated in linear regression that

$ MSE_{validation}=\dfrac{n+p+1}{n-p-1} MSE_{train}$.

Therefore, there is no need to use cross validation in a linear regression to estimate how the model would perform on a validation dataset.

Cross-Validation

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in DISCRETE-MATHEMATICS

Related Questions in MATHEMATICAL-PHYSICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in MATHEMATICAL-MODELING

Trending Questions

Popular # Hahtags

Popular Questions