Probability of one PDF being greater than the other

108 Views Asked by At

Given two r.v. $x_1$ and $x_2$ and their PDFs $f_1(x)$ and $f_2(x)$, and assuming for simplicity that both are one-dimensional with the same distributional support, I'm interested in the expected value:

$E_{x \sim f_1} [f_1(x) \geq f_2(x)] = \int f_1(x) \cdot I[f_1(x) \geq f_2(x)] dx,$

where $I$ is an indicator function returning 1 if $f_1(x) \geq f_2(x)$, and 0 otherwise.

Intuitively, the above expression is equal to the probability of $f_1(x)$ being greater-equal than $f_2(x)$ when $x$ is sampled from $f_1$. I'm looking for any possible relations between this expression and other known statistical modalities (e.g. divergences between $x_1$ and $x_2$, discrepancies and distribution distances, etc.) that may resemble it in some way. I will appreciate any of your help and directions. Thanks

1

There are 1 best solutions below

0
On BEST ANSWER

Denote $E_1$ and $E_2$ as:

$E_1 = \int f_1(x) \cdot I[f_1(x) \geq f_2(x)] dx,$

$E_2 = \int f_2(x) \cdot I[f_1(x) \geq f_2(x)] dx$.

Additionally, define $\phi$ and $D_{\phi}$ as:

$\phi(z) = \begin{cases} z - 1,& \text{if } z \geq 1\\ 0, & \text{if } z < 1 \end{cases},$

$D_{\phi}(x_1, x_2) = \int f_2(x) \cdot \phi(\frac{f_1(x)}{f_2(x)}) dx.$

Since $\phi$ is convex and satisfies $\phi(1) = 0$, $D_{\phi}$ is proper $f$-divergence between two r.v.'s $x_1$ and $x_2$. Intuitively, it defines how far from each other are distributions of $x_1$ and $x_2$.

Under assumed setting that both distributions have identical support, it is easy to show that: $D_{\phi}(x_1, x_2) = E_1 - E_2$.

In other words, the probability of the event $f_1(x) \geq f_2(x)$ when $x$ is sampled from $f_1$, minus the probability of the event $f_1(x) \geq f_2(x)$ when $x$ is sampled from $f_2$ is equal to the distance $D_{\phi}(x_1, x_2)$ between distributions.


Why/where is it important?

In the context of binary classification, where $f_1$ is PDF of data with label "one" and $f_2$ is PDF of data with label "two", we can show that Bayes optimal classifier has following performance properties:

$TPR \text{ (true positive rate)} = E_1,$

$FPR \text{ (false positive rate)} = E_2,$

$J \text{ (Youden's J statistic)} = TPR - FPR = E_1 - E_2$

$J$ takes values in $[0, 1]$, and represents a bad classification performance when $J = 0$ (random guess), and a good classification performance when $J = 1$ (full separation of two labels).

From the first part it follows that:

$J = D_{\phi}(x_1, x_2)$.

In other words, when the distance between two distributions is big - the classification (i.e. the distinction between two labels) is good, and vice versa. While such conclusion is something we would expect in the first place, it is still nice to prove it in mathematical way :)