For the philosophy of Statistical Learning Theory, the Risk Minimization is the focus. We want to minimize $$ R(f)=\mathbb{E}[L(f(x),y)]=\int~L(f(x),y) ~dP(x,y), $$ where $P(x,y)$ is the "joint probability distribution".
Is it "joint probability density/mass function (pdf/pmf)" or "joint cumulative distribution function (CDF)" ?
I understand $R(f)=\mathbb{E}[L(f(x),y)]$, but I think the integral should be something like $$ \int_{\mathcal{X}\times\mathcal{Y}}~L(f(x),y)~p(x,y)~d?? $$
that $\mathcal{X}\times\mathcal{Y}$ is the space of $x,y$, but I don't know how to write the $d??$ thing.
It seems that the $\int~L(f(x),y) ~dP(x,y)$ that $dP(x,y)$ is not the so called "change of variable" in the undergrad calculus that $$ \mathbb{E}[f(X)]=\int_{-\infty}^\infty f(x)g'(x)~dx=\int_{-\infty}^\infty f(x)dg(x), $$ which is Riemann–Stieltjes integral, $g(x)$ is a CDF, $g'(x)$ is the pdf.
So I guess it is not Riemann-Stieltjes Integral (not the "change of variable (interation by substitution)"), but the Lebesgue Integral? However, I have very little knowledge in Lebesgue Integral and Measure Theory thing.