I'm trying to understand the notation used at p18 of The Elements of Statistical Learning. I suspect errors in notation. What do the authors mean and, if any notational errors, what would be the correct notation to use in:
" We [...] place ourselves in the world of random variables and probability spaces. Let $X \in \mathbb{R}^p$ denote a real valued random input vector, and $Y \in \mathbb{R}$ a real valued random output variable, with joint distribution $\Pr(X,Y)$. We seek a function $f(X)$ for predicting $Y$ given values of the input $X$. This theory requires a loss function [...] $L(Y, f(X)) = (Y-f(X))^2$. [...]
$$E(Y-f(X))^2 = \int[y-f(x)]^2\ \Pr(dx, dy)$$ "
My attempt:
At first I thought $X$ and $Y$ are random variables, given the context, and the subsequent use of $y$ and $x$; but $X$ (a vector) and $Y$ (a real number) aren't even functions.
I'll assume they meant $X : \Omega_1 \rightarrow \mathbb{R}^p$ and $Y : \Omega_2 \rightarrow \mathbb{R}$.
I interpret $\Pr(X,Y)$ as $\mathbb{P}(X=x,Y=y)$. (Related: $\Pr$ vs $\mathbb{P}$)
And $E\ldots := E[\ldots]$. Or should I say $\mathbb{E}[\ldots]$?
With these assumptions, is it then correct to say: $$E(Y-f(X))^2 = \int[y-f(x)]^2\ \Pr(dx, dy)$$ where instead I would expect: $$E(Y-f(X))^2 = \int_{\mathbb{R}^p \times \mathbb{R}} [y-f(x)]^2\ \mathbb{P}(X=x, Y=y)\ d(x,y)$$
Whoever wrote this shouldn't have written $\Pr(X,Y)$.
One can write $\Pr(X=x)$, and the capital $X$ is the random variable, and the whole expression is a function of $x$, which could be anything in the range. When writing like that, the expression $$ \sum_{x=1}^3 \Pr(X=x) $$ means $$ \Pr(X=1)+\Pr(X=2)+\Pr(X=3) $$ and you don't see lower-case $x$ in there anywhere.
However this author seems to contemplate a distribution that might not be discrete, so $\Pr(x,y)$ would not mean $\Pr(X=x\ \&\ Y=y)$. Rather, $\Pr(x,y)$ is a notational device referring to a probability measure on $\mathbb R^2$ that has the property that $$ \operatorname{probability}(X\in A\ \&\ Y\in B) = \int_{(x,y)\in A\times B} 1\, \Pr(dx,dy) $$ so that $$ \operatorname{E}(g(X,Y)) = \int_{\mathbb R^2} g(x,y)\, \Pr(dx,dy). $$