In the book Element of Statistical Learning, the author says that the Expected Prediction Error, for an arbitrary test point $x_0$ is: $$EPE(x_0) = E_{y_0 | x_0}E_\mathcal{T}(y_0 -\hat{y}_0)^2$$
where $T$ denote the training set and $\hat{y}_0$ is our prediction in the point $x_0$.
My question is:
If the first quantity $E_\mathcal{T}(y_0 -\hat{y}_0)^2$ give as a result only one value (because the expectation value is essentially the mean), why we have to take another mean, given by $E_{y_0 | x_0}$ ?
Why do we have to do the second expected value (or the mean) on a single value? Which is the sense of making the mean on a unique value?