Notation of expectation and random variables

467 Views Asked by At

I'm trying to understand the notation used at p18 of The Elements of Statistical Learning. I suspect errors in notation. What do the authors mean and, if any notational errors, what would be the correct notation to use in:

" We [...] place ourselves in the world of random variables and probability spaces. Let $X \in \mathbb{R}^p$ denote a real valued random input vector, and $Y \in \mathbb{R}$ a real valued random output variable, with joint distribution $\Pr(X,Y)$. We seek a function $f(X)$ for predicting $Y$ given values of the input $X$. This theory requires a loss function [...] $L(Y, f(X)) = (Y-f(X))^2$. [...]

$$E(Y-f(X))^2 = \int[y-f(x)]^2\ \Pr(dx, dy)$$ "


My attempt:

At first I thought $X$ and $Y$ are random variables, given the context, and the subsequent use of $y$ and $x$; but $X$ (a vector) and $Y$ (a real number) aren't even functions.

I'll assume they meant $X : \Omega_1 \rightarrow \mathbb{R}^p$ and $Y : \Omega_2 \rightarrow \mathbb{R}$.

I interpret $\Pr(X,Y)$ as $\mathbb{P}(X=x,Y=y)$. (Related: $\Pr$ vs $\mathbb{P}$)

And $E\ldots := E[\ldots]$. Or should I say $\mathbb{E}[\ldots]$?

With these assumptions, is it then correct to say: $$E(Y-f(X))^2 = \int[y-f(x)]^2\ \Pr(dx, dy)$$ where instead I would expect: $$E(Y-f(X))^2 = \int_{\mathbb{R}^p \times \mathbb{R}} [y-f(x)]^2\ \mathbb{P}(X=x, Y=y)\ d(x,y)$$

2

There are 2 best solutions below

0
On BEST ANSWER

Whoever wrote this shouldn't have written $\Pr(X,Y)$.

One can write $\Pr(X=x)$, and the capital $X$ is the random variable, and the whole expression is a function of $x$, which could be anything in the range. When writing like that, the expression $$ \sum_{x=1}^3 \Pr(X=x) $$ means $$ \Pr(X=1)+\Pr(X=2)+\Pr(X=3) $$ and you don't see lower-case $x$ in there anywhere.

However this author seems to contemplate a distribution that might not be discrete, so $\Pr(x,y)$ would not mean $\Pr(X=x\ \&\ Y=y)$. Rather, $\Pr(x,y)$ is a notational device referring to a probability measure on $\mathbb R^2$ that has the property that $$ \operatorname{probability}(X\in A\ \&\ Y\in B) = \int_{(x,y)\in A\times B} 1\, \Pr(dx,dy) $$ so that $$ \operatorname{E}(g(X,Y)) = \int_{\mathbb R^2} g(x,y)\, \Pr(dx,dy). $$

0
On

The notation is indeed lacking at some points. $X$ and $Y$ are $\mathbb{R}^p$ resp. $\mathbb{R}$-valued random variables. In this context they must be functions on a common set $\Omega$, so $X: \Omega \to \mathbb{R}^p$ and $Y:\Omega \to \mathbb{R}$ are measurable functions.

$\Pr(X, Y)$ denotes the joint distribution of $X$ and $Y$. If $\mathbb{P}$ denotes the probability measure on the space $\Omega$, then this simply means $\Pr(X, Y) = \mathbb{P}^{(X, Y)} = \mathbb{P} \circ (X, Y)^{-1}$ [this is a measure on $\mathbb{R}^p \times \mathbb{R}$]. Now the $L^2$-distance between $f(X)$ and $Y$ is simply $$\mathbb{E}[(Y - f(X))^2] = \int_{\mathbb{R}^p \times \mathbb{R}} (y - f(x))^2 \; d\mathbb{P}^{(X, Y)}(x, y).$$

This simply means that you integrate the function $(x, y) \mapsto (y - f(x))^2$ with respect to the measure which is the joint distribution of $X$ and $Y$. Note that this is not equivalent to your last formula.