Why can the test error be written in terms of the training error in this way?

63 Views Asked by At

In the below picture encircled in red:

If $L_D(h) = E_{z \text{~} D}[l(h, z)]$,

Then how is $L_D(h) = E_{S' \text{~} D^m}[L_{S'}(h)] $?

I see that $$\large L_D(h) = E_{z \in Z}[l(h, z)] = \sum_{z \in Z} l(h,z)D(z)$$ where $D$ is the distribution on $Z$ the set of samples. The first equation should be defined as:

$$\large E_{S' \text{~} D^m}[L_{S'}(h)] = \sum_{S'}[\frac{1}{m}\sum_{z_i \in S'}l(h,z_i)]D^m(S')$$

But how are these two qual?


enter image description here