Why is the risk equal to the empirical risk when taking the expectation over the samples?

503 Views Asked by At

From Understanding Machine Learning: From theory to algorithms:

Let $S$ be a set of $m$ samples from a set $Z$ and $w^*$ be an arbitrary vector. Then $\Bbb E_{S \text{ ~ } D^m}[L_S(w^*)] = L_D(w^*)$.

Where: $L_S(w^*) \equiv \frac{1}{m}\sum_{i=1}^ml(w^*, z_i)$ and $z_i \in S$, $L_D(w^*) \equiv \Bbb E_{z \text{ ~ }D}[l(w^*, z)]$, $D$ is a distribution on $Z$, and $l(\text{_},\text{_} )$ is a loss function.

I see that $$\Bbb E_S[L_S(w^*)] = \Bbb E_S[\frac{1}{m}\sum_{i=1}^ml(w^*, z_i)] = \frac{1}{m}\sum_{i=1}^m \Bbb E_S[l(w^*, z_i)]$$ and $$L_D(w^*) = \Bbb E_z[l(w^*, z)] = \sum_{z \in Z} l(w^*, z)D(z)$$

But how are these two equal? $\Bbb E_S$ is an expectation over samples $S$ of size $m$ whereas $\Bbb E_z$ is an expectation over all samples in $Z$.

1

There are 1 best solutions below

13
On

The sample is IID - taking the expectation with respect to the sample is equivalent to taking the expectation with respect to each data point and this expectations are all equal ("same distribution").

$$\frac{1}{m}\sum_{i=1}^m \Bbb E_S[l(w^*, z_{i})] = \frac{1}{m}\sum_{i=1}^m \Bbb E_{z_{i}}[l(w^*, z_{i})] = \frac{1}{m}\sum_{i=1}^m \Bbb E_{z}[l(w^*, z_)] =\frac{1}{m}\sum_{i=1}^m \ L_{D}(w*) = L_{D}(w*) $$

First two sums are equal because each term depends only on one RV - $z_{i}$ (is a constant with respect to other RV ) and the expectation of a constant is just the constant .