Problem with conditional expected value

26 Views Asked by At

In the book Element of Statistical Learning, the author says that the Expected Prediction Error, for an arbitrary test point $x_0$ is: $$EPE(x_0) = E_{y_0 | x_0}E_\mathcal{T}(y_0 -\hat{y}_0)^2$$

where $T$ denote the training set and $\hat{y}_0$ is our prediction in the point $x_0$.

My question is:

If the first quantity $E_\mathcal{T}(y_0 -\hat{y}_0)^2$ give as a result only one value (because the expectation value is essentially the mean), why we have to take another mean, given by $E_{y_0 | x_0}$ ?

Why do we have to do the second expected value (or the mean) on a single value? Which is the sense of making the mean on a unique value?

1

There are 1 best solutions below

0
On
  • The randomness in $\hat{y}_0$ comes from the randomness of training data $\mathcal{T}$.
  • The randomness in $y_0$ (conditioned on the test point $x_0$) is assumed to be independent of the training data $\mathcal{T}$.
  • The inner expectation is with respect to the training data, and thus averages over the randomness coming from $\hat{y}_0$, but treats $y_0$ as a fixed quantity. Thus you can think of $E_{\mathcal{T}}(y_0 - \hat{y}_0)^2$ as a function of $y_0$, say $g(y_0)$, which still retains the randomness in $y_0$. Then the outer expectation $E_{y_0 \mid x_0} g(y_0)$ will average over the randomness in $y_0$.