Expected Risk in Machine Learning

118 Views Asked by At

I am currently working through some Statistical Learning Theory and the following is confusing me.

For a fixed learning algorithm $A$ that maps training data $S$ to a function ("prediction") $f_{S}$ we define the risk as $$\mathcal{R}(f_{S})= \mathbb{E}[L(f_{S}(X),Y)]$$ where $L$ is a suitable loss function, $X$ are the predictors and $Y$ is the response variable. This is an expectation over the underlying distribution $P$ of $(X,Y)$.

Now the expected risk is defined as $$\mathbb{E}_{S}[\mathcal{R}(f_{S})]$$ Here $S$ is distributed as $P^{n}$ (we are assuming i.i.d sampling from the underlying distribution of course).

Now comes the tricky part: We replace one random observation $(x_{i},y_{i})$ from $S$ by a (potentially different) randomly chosen $(x,y)$ from $(X,Y)$. This changes one observation in the training set and thus also the output of the learning algorithm. Now the following quantity is defined:

$$\mathbb{E}_{S} \; \mathbb{E}_{(x,y) \sim P} \; \mathbb{E}_{i} [ L(y_{i},f_{S^{i}}(x_{i})]$$

where $\mathbb{E}_{i}$ means that we randomly (uniformly) replace observation $i$ in $S$ by $(x,y)$.

This is confusing me. Isn't that simply $$\mathbb{E}_{S}[\mathcal{R}(f_{S})]$$ from above? If so, how can I prove it rigorously?