I originally posted this question on Cross Validated but thought it might be more relevant here since the answer I seek involves more of mathematical manipulation rather than statistical techniques.
Problem Statement
Let $h \in \mathcal{H}$ be a hypothesis to some class of binary classifiers $\mathcal{H}$. Show that $$\mathbb{E}_{\mathcal{D}_n}\left[R_e(h)\right] = R(h)$$ where the expectation on the LHS is over all possible training datasets $\mathcal{D}_n$ of size $n$.
- $R_e(h)$ is the empirical risk of the algorithm over a given dataset $\mathcal{D}_n$. It is defined as
$$R_e(h) = \frac1n\sum_{i=1}^{n}\mathcal{L}(x_i, h(x_i))$$
Here $\mathcal{L}$ is the loss function for the binary classification problem defined as $$\mathcal{L}(x,h) = \begin{cases} 1, & s(x) \not= h(x) \\ 0, & \text{otherwise} \end{cases} $$
$s(x)$ is the system we are trying to model
$R(h)$ is the true risk of the hypothesis $h$
My work
$$R_e(h) = \frac1n\sum_{i=1}^{n}\mathcal{L}(X_i, h(x_i))$$ $$\mathbb{E}_{\mathcal{D}_n}\left[R_e(h)\right] = \int_{\mathcal{D}_n}{R_e(h)p(\mathcal{D}_n)}$$ $$ = \frac{1}{n}\int_{\mathcal{D}_n}{\sum_{x_i \in \mathcal{D}_n}\mathcal{L}(x_i, h)p(\mathcal{D}_n)}$$
Since I want to manipulate this to convert it to $R(h) = \int_{x}{\mathcal{L}(x,h)p(x)dx}$, I though of group all $x_i$ out of the above equation. But then I couldn't find a way to get the term $p(x)$ into the picture and this is where I am stuck.
I am looking for progressive hints that will help me solve this myself. Thanks!