In the setting of linear regression:
The prediction error using k-fold cross-validation is given by : \begin{equation}\label{eq1} \begin{split} err_{k-fold} = \frac{1}{k} \sum_{i=1}^{k} \frac{1}{n/k}\sum_{j\in ind[i]}l(h_{D_{(-i)}}(x_j),y_j) \end{split} \end{equation}
Where $h_{D_{(-i)}}$ means that the model has been trained on all the fold except the ith one
I want to show that :
\begin{equation}\label{eq2} \begin{split} \mathop{\mathbb{E}}_{D\sim p}\left[ err_{k-fold}\right] = \mathop{\mathbb{E}}_{{D'\sim p} |{(x,y)\sim}p}\left[ (y-h_{D'}(x))^2. \right] \end{split} \end{equation}
Where $D$ is a dataset of size n and $D'$ is a dataset of size $n-\frac{n}{k}$
And then prove that the $err_{k-fold}$ is an almost unbiased estimate of the risk of $h_D$
My approach was to try to compute the expectation of $err_{k-fold}$ and use the fact that D and D' are sampled from the same distribution. but I am lost. Can you please help me?
Thanks
By linearity of expectation, you can push the expectation into the sum $$E[\text{err}_k] = \frac{1}{k} \sum_{i=1}^k \frac{1}{n/k} \sum_{j \in \text{fold}_i} E[l(h_{D_{-i}}(x_j), y_j)].$$ The value of the inner expectation is precisely equal to $$\underset{\substack{D' \sim p \\ (x,y) \sim p}}{E}[l(h_{D'}(x), y)]$$ regardless of the value of $i$ and $j$. So the double sum is just an average of $n$ copies of the same quantity.