Integrating a function of measures

117 Views Asked by At

I've been reading John Baez's series of posts on Information Geometry. I'm currently on part 6... Midway through the post he discusses Radon-Nikodym derivatives:

The formula for information gain looks more slick: $$\int_\Omega \log\left(\frac{d\mu}{d\nu}\right)d\mu$$ And by the way, in case you’re wondering, the $d$ here doesn’t actually mean much: we’re just so brainwashed into wanting a $dx$ in our integrals that people often use $d\mu$ for a measure even though the simpler notation $\mu$ might be more logical. So, the function $\frac{d\mu}{d\nu}$ is really just a ratio of probability measures, but people call it a Radon-Nikodym derivative, because it looks like a derivative (and in some important examples it actually is). So, if I were talking to myself, I could have shortened this blog entry immensely by working with directly probability measures, leaving out the $d$'s, and saying:

Suppose $\mu$ and $\nu$ are probability measures; then the entropy of $\mu$ relative to $\nu$, or information gain, is $$S(\mu,\nu) = \int_\Omega \log\left(\frac{\mu}{\nu}\right)\mu$$

I understand the integral when formulated as (a log of) the Radon-Nikodym derivative... since that's just a function on elements of $\Omega$, the integral is just the Lebesgue integral with respect to $d\mu$. However, I don't understand how the second integral is defined... $\log\left(\frac{\mu}{\nu}\right)$ isn't a function of elements of $\Omega$, but if anything of subsets of $\Omega$ (and it clearly isn't itself a measure). What's the right way of thinking about this integral?

Intuitively, my first instinct is to break $\Omega$ into a bunch of disjoint subsets whose maximum measure according to $\mu$ is bounded, and take the limit of a sum over these sets as the bound decreases. Let's say for now all the measures involved are dominated by Lebesgue measure. Something like: Let $A_i$ be a set of subsets of $\Omega$ such that

  • $\cup_i A_i = \Omega$
  • $A_i \cap A_j = \emptyset$ when $i \ne j$
  • $\max_i \mu(A_i) < \varepsilon$

Then $$ \int_\Omega \log\left(\frac{\mu}{\nu}\right)\mu \equiv \lim_{\varepsilon \to 0} \sum_i \log\left(\frac{\mu(A_i)}{\nu(A_i)}\right)\mu(A_i) $$

Clearly that isn't terribly rigorous, but is this on the right track conceptually? Or am I just deeply confused?