Formal underpinnings of cross-validation?

141 Views Asked by At

In the past, I've used cross-validation in a purely instrumental way--namely, to ensure that models have decent predictive ability. I know that cross-validation works, but I don't know why it works on a formal mathematical level.

Is there a formal derivation showing that cross-validation of any form (leave one out, exhaustive, or other) maximizes predictive validity, or is cross-validation an ad hoc tool that we use because it intuitively makes sense?

1

There are 1 best solutions below

0
On

Here's a nice explainer. Also, Elements of Statistical Learning, specifically chapter 7 (p. 241) has a detailed explanation of cross validation and in what sense it works.

Basically, it works in the same sense that the Bootstrap works. We can use cross-validation to estimate the expected prediction error (when averaged across all possible training sets...this is important) by sub-sampling from our training data. We assume our training data is a representative sample from the distribution of possible training data and then use sub-samples (without replacement) to simulate repeated draws from the underlying population of possible training sets.

One thing that cross validation is not good for is estimating the prediction error that will result by fitting your model to the full training set. To do that, you need to collect a new test set (at which point you may likely want to re-fit to the larger dataset...leading to the same conundrum). However, absent that extra data, we might as well go with the model with best unconditional expected error.