What does Pr(dx, dy) mean?

2k Views Asked by At

The book The Elements of Statistical Learning by Hastie and others (page 18) defines the expected value of prediction error as \begin{align} \operatorname{EPE}(f) &= \operatorname E(Y - f(X))^2\\ & = \int [y - f(x)]^2 \Pr(dx, dy) \end{align}

Why is it like above? Why not as below to be consistent with any expected value definition? $$ \operatorname{EPE}(f) = E(Y - f(x))^2 = \iint [y - f(x)]^2 \Pr(x,y) d(x) d(y)$$ What does $\Pr(dx, dy)$ even mean?

2

There are 2 best solutions below

0
On

By the definition of a density function the probability of x being in an infinitesimal range [x, x + dx] is

Pr(dx)=f(x)dx

We can extend this to the two variable situation in which case;

Pr(dx, dy) = f(x,y)dx dy

0
On

They use this notation to avoid having to distinguish cases where the random variables $X$ and $Y$ are continuous, discrete, or a mix of these. In general, when you see a notation like $\int_A g(x) P(dx)$ or $\int_A g(x) dP$, you can expand it to $\int_A g(x) p(x) dx$ if $X$ is continuous, and to $\sum_{x \in A} g(x) p(x)$ if $X$ is discrete, where $p(x)$ is the probability density/mass function respectively derived[1] from $P(\cdot)$.

If you want to understand more about this notation, you have to read an introductory chapter in measure theory. Since $P(\cdot)$ is a measure, it must be evaluated on sets. The notation $P(dx)$ refers to the measure $P(\cdot)$ evaluated at a small ball around $x$. Recall the interpretation of an integral as a sum. The notation $\int_A g(x) P(dx)$ means: cover the set $A$ with balls $dx$ centered around several points $x$ and sum $g(x)P(dx)$.

[1] $p$ is called the Radon-Nikodym derivative of $P$ with respect to the dominating measure in each case, i.e. the Lebesgue measure in the continuous case, and the counting measure in the discrete case. Confusingly, some authors use the same symbol to denote both the measure and its derivative.