Can someone help me find/ give me a proof of the PRESS formula for the Predicted Residual Sum of Squares in K-fold Cross Validation?

29 Views Asked by At

Similar to the formula for the n-fold CV case (LOOCV), $$\mathrm{PRESS} = \sum_{i = 1}^{n}\left(\frac{y_i-\pmb{x}_i^\top \mathbf{X}^+\pmb{y}}{1-p_i}\right)^2$$ Where $p_i$ is the ith leverage (this formula I understand), I want to understand how to prove the generalized formula for K-fold CV: $$\mathrm{PRESS} = \sum_{k = 1}^{K}\left\lVert\left(\mathbf{I}_{n_k} - \mathbf{P}_{n_k}\right)^{-1}\left(\pmb{y}_k-\mathbf{X}_k \mathbf{X}^+\pmb{y}\right)\right\rVert^2$$ Here, $n_k$ represents the number of elements in the kth fold, and $\pmb{y}_k$ and $\mathbf{X}_k$ represent the response variables, and matrix of features for the kth test set. The matrix $\mathbf{P}_{n_k}$ is then the kth diagonal block matrix of $\mathbf{P}=\mathbf{X}\mathbf{X}^+$, the projection matrix.

In the course I'm doing on linear models, the 'proof' isn't sufficiently detailed for me to follow it. It pretty much just refers to the Woodbury identity and then claims that the result follows. I can't seem to find any proofs online for this result, so I'm hoping someone on here could help me out or at least point me in the right direction.

Thank you!

As a side note, clearly in the formula for n-fold CV, this only works if none of the leverages are 1. I'm wondering what exactly this means? Of course, the ith leverage being 1 would imply that the rest of the ith row and column are both $\pmb 0$ and that the ith prediction is going to be exactly the observed value. But is there some deeper conceptual understanding I'm missing, as I can't see intuitively why this should result in a collapse of the PRESS formula. In other words, I can't quite tell intuitively what the formula is 'doing', what 'shortcut' it is taking that would therefore be invalidated by the training residual of this specific point being 0. This is also reflected in the assumption that $\left(\mathbf{I}_{n_k} - \mathbf{P}_{n_k}\right)$ is invertible - and I don't understand what exactly this implies conceptually.