What is "Cross-Validation Error" in plain English?

69 Views Asked by At

Say you use Cross-Validation to fit a regression model to a dataset. You get a bunch of CV-scores (cross-validation errors). What exactly is a Cross-Validation Error?

1

There are 1 best solutions below

0
On

Cross-validation is used to validate the model trained on some data against out-of-sample data to prevent overfitting. Typically the errors are based on some measure of model performance - for regression models it is typically mean-squared-error.

Supposing you are using $k$-fold cross-validation where the dataset is split into $k$ equally sized sub-samples, the model is trained on the $\{1,...,k-1\}$ folds, validated against the $k$th fold and the CV-scores would be the mean-squared-error (or other metric) on the out-of-sample data for each fold.

You could fit the regression model on the full dataset and establish a baseline level of error. Then the CV-scores represent how well the model generalizes to unseen data, and you can take the mean of all CV-scores and compare it with the baseline to determine whether the model may be overfitting.