Why is it necessary for density functions to be absolutely continuous with respect to a measure in order for the cross entropy to be defined?

654 Views Asked by At

In the Wikipedia page describing cross entropy, the following expression is written down to denote the cross entropy $H$ between two densities $p(x)$ and $q(x)$:

$H(p,q) = - \int_\mathcal{X}p(x)\log q(x)dr(x)$

The page mentions that "$p$ and $q$ must be absolutely continuous with respect to some measure $r$ (usually $r$ is a Lebesgue measure on a Borel $\sigma$-algebra)".

The definition of absolute continuity is this: Let $I$ be an interval on the real line; let $(x_k, y_k)$ be a finite sequence of pairwise disjoint sub-intervals of $I$ where $x_k < y_k \in I$; a function $f$ is absolutely continuous with respect to $I$ if $\forall \epsilon > 0$ there exists $\delta >0$ such that we have $\sum_k (y_k - x_k) < \delta \implies \sum_k|f(y_k) - f(x_k)|<\epsilon$

I have some intuition for what absolute continuity with respect to $I$ might mean - if I want the output to change by a tiny amount $\epsilon$ from $f(x_k)$ to $f(y_k)$, an absolutely continuous function on $I$ guarantees that I only need to push $x_k$ by a tiny amount that is upper bounded by $\delta$. But why is this an important assumption to make in order for the cross entropy to be defined? I just don't see the link at all, and I am looking for some intuition.

1

There are 1 best solutions below

0
On BEST ANSWER

"Absolute continuity" has two different definitions depending on the context. (They are related, which is why they share the same name, but the relation is not immediately obvious.)


The definition you cited ($\sum_k (y_k - x_k) < \delta \implies \sum_k |f(y_k) - f(x_k)| < \epsilon$) is for absolute continuity of functions. However, what your statement is really about is absolute continuity of measures. Specifically, when they say "$p$ is absolutely continuous with respect to $r$", they are talking about the probability measures whose densities are $p$ and $r$, and not about the actual density functions $p$ and $r$.

In fact, for a probability measure to even have a density function with respect to a reference measure is one equivalent definition of absolutely continuity with respect to that reference measure. (For instance, in undergrad probability you deal with "continuous random variables" that have a density function on $\mathbb{R}$; we would say these distributions are absolutely continuous with respect to the Lebesgue measure.) If a probability measure $P$ is not absolutely continuous with respect to a reference measure $R$, then it does not even have a density function $p$ such that $P(A) = \int_A p(x) \, dR(x)$.

So, regarding the discussion about cross-entropy, it is basically saying that "in order to even write down the integral $\int p(x) \log q(x) \, dr(x)$, we need to know whether the probability measures $P$ and $Q$ have densities with respect to a reference measure $R$." If you are even able to write down this integral, you've already assumed the necessary absolute continuity condition.


Finally, the way the two separate notions of absolute continuity are related is through the CDF (not the density function). Specifically, the measure $P$ is absolutely continuous w.r.t the Lebesgue measure if and only if the function $F(x) = P((-\infty, x])$ is absolutely continuous [as a function] on $\mathbb{R}$.