The book The Elements of Statistical Learning by Hastie and others (page 18) defines the expected value of prediction error as \begin{align} \operatorname{EPE}(f) &= \operatorname E(Y - f(X))^2\\ & = \int [y - f(x)]^2 \Pr(dx, dy) \end{align}
Why is it like above? Why not as below to be consistent with any expected value definition? $$ \operatorname{EPE}(f) = E(Y - f(x))^2 = \iint [y - f(x)]^2 \Pr(x,y) d(x) d(y)$$ What does $\Pr(dx, dy)$ even mean?
By the definition of a density function the probability of x being in an infinitesimal range [x, x + dx] is
Pr(dx)=f(x)dx
We can extend this to the two variable situation in which case;
Pr(dx, dy) = f(x,y)dx dy