I have read this question and I am confused by a part of the first answer, even though it is asked in the comments.
I don't understand why $$L_{(\mathcal{D}, f)}(h^{*}) = 0 \implies L_{S}(h^{*}) = 0$$
Why if the "True Error" equal to $0$ i.e. $L_{(\mathcal{D}, f)}(h^{*}) = 0$ then it's implied that the "Training Error" is also equal to $0$ i.e. $ L_{S}(h^{*}) = 0$?
Also, I would like to clarify my understanding of $\mathcal{D}$.
My understanding of $\mathcal{D}$ is that it's some probability distribution of the input data $\mathcal{X}$ (as stated in the book).
For example, let $\mathcal{X}=\{x_1,x_2,x_3\}$ and $\mathcal{Y}=\{0,1\}$. Let $f(x_i)=1$ if $x\geq 0$ and $0$ otherwise. Then let $\mathcal{D}$ be such that $P(x_1\geq 0)=1/2, P(x_2\geq 0)=1/3, P(x_3 \geq 0)=1/4$. Is this a reasonable example of how to interpret $\mathcal{D}$?