Notation of mutual information for continuous variable

64 Views Asked by At

I'm reading a paper, 'Mutual Information Neural Estimator.' In this paper, the notation of mutual information is written as, $$ I(X;Z) = \int_{\mathcal{X} \times \mathcal{Z}} \log \frac{d \mathbb{P}_{XZ}}{d \mathbb{P}_{X} \otimes d \mathbb{P}_{Z}} d \mathbb{P}_{XZ}.$$ Similarly, the KL divergence is written as , $$D_{KL} (\mathbb{P} \vert \vert \mathbb{Q}) = \mathbb{E}_\mathbb{P} \Bigl[ \log \frac{d \mathbb{P}}{d \mathbb{Q}} \Bigr].$$

But I'm familiar with other notations like(from Wikipedia)
MI for continuous variable, $$ I(X;Z) = \int_\mathcal{z} \int_\mathcal{x} P_{(X,Z)} (x,z) \log \Big( \frac{P_{(X,Y)}(x,y)}{P_X (x) P_Y (y)} \Bigr) dx dy, $$ and KL divergence $$D_{KL} (P \vert \vert Q) = \sum_{x \in \mathcal{X}} P(x) \log \frac{P(x)}{Q(x)}.$$

My questions are,

  1. why they use $\text{\mathbb}$ format like $\mathbb{P}_X $ instead of $P(x)$?
  2. And why is it $d \mathbb{P}_X $, not just $\mathbb{P}_X $?
  3. Why $\sum P(x) $ become expectation?

I think the notation in this paper is quite confusing. Is there any reason they use this types of notation?