Clarifying expectation with respect to a function, and the Radon-Nikodym derivative

68 Views Asked by At

I am trying to understand induced measures in the context of expectation. I have commonly seen formulas such as KL-divergence written like so:

$$\text{KL}(p||q) = \int\log\dfrac{p(x)}{q(x)}dp = \int p(x) \log\dfrac{p(x)}{q(x)}dx = \mathbb{E}_{p(x)}[\log\dfrac{p(x)}{q(x)}]$$

which reads as the expectation of the formula $\log\dfrac{p(x)}{q(x)}$ with respect to the measure $p(x)$. But over what set is this measure being taken?

Is it over the entire real line: $\int_{-\infty}^\infty p(x) \log\dfrac{p(x)}{q(x)}dx$?

Or is it over the support of $p$: $\int_{x:p(x) >0} p(x) \log\dfrac{p(x)}{q(x)}dx$?

For the KL-divergence formula these turn out to be the same, but consider the expectation $\mathbb{E}_{p(x)}[\dfrac{q(x)}{p(x)}] = \int p(x)\dfrac{q(x)}{p(x)}dx = \int q(x)dx$.

In the case of the real line, we have $\int_{-\infty}^\infty q(x)dx = 1$.

In the case of the support of $p$, we have $\int_{x:p(x)>0} q(x)dx$, which is obviously very different.

I believe the second is correct, but I was unable to prove this in a way that satisfied me intellectually. My work is as follows:

Interpreting $p$ as the Radon-Nikodym derivative of an induced measure $\nu$, and letting $S$ represent the support $S = \{x: p(x) > 0\}$, we can calculate $\int \log\dfrac{q(x)}{p(x)}dp = \int_S \dfrac{q(x)}{p(x)}dp + \int_{S^C} \dfrac{q(x)}{p(x)}dp$. We have $\nu(S^C) = \int_{S^C} p(x)dx = 0$, thus the second integral $\int_{S^C} \dfrac{q(x)}{p(x)}dp = 0$ and that leaves us with the integral over the support as desired.

To me, this proof seems valid but it also seems like cheating. I just basically look at the set of measure $0$ and declare that I can ignore it even though the value being integrated over there ($\dfrac{q(x)}{p(x)}$) is infinite. This post on a similar question stated I could do that, but I don't understand why.

1

There are 1 best solutions below

4
On
  • $E_{X \sim p} \log \frac{p(X)}{q(X)}$ is actually shorthand for $E_{X \sim p} f(X)$ where $f(x) = \begin{cases} \log \frac{p(x)}{q(x)} & p(x) > 0 \\ 0 & p(x) = 0\end{cases}$. The assumption of absolute continuity ($q(x)=0$ implies $p(x)=0$) ensures that $f$ is well-defined (no division by zero). This needs to be explicitly baked into the definition of the KL divergence, since "$\log \frac{0}{0}$" and "$0 \log \frac{0}{0}$" are ambiguous.
  • Similarly, if you write $E_{X \sim p} \frac{q(X)}{p(X)}$, you need to be explicit in how you define the function. You could say $E_{X \sim p} g(X)$ where $g(x) = \begin{cases} \frac{q(x)}{p(x)} & p(x) > 0 \\ 0 & p(x) = 0\end{cases}$ if you wish, but this is something that needs to be stated; $E_{X \sim p} \frac{q(X)}{p(X)}$ alone is ambiguous, since you get a $0 \cdot \infty$ situation.