Equvivalent definitions of mutual information of continuous random variables

40 Views Asked by At

I am reading Elements of Information Theory by Cover and Thomas (2006) and struggle with the definition of mutual information for continuous random variables (Chapter 9: Differential Entropy). For two random varibles with a joint pfd $f(x, y)$, they define the mutual information as \begin{equation} I(X;Y) = \int f(x, y) \log \frac{f(x, y)}{f(x)f(y)} \text{d}(x, y). \end{equation} Later, they give a more general definition using mutual information of discrete random variables as \begin{equation} I(X;Y) = \text{sup}_{P, Q} I([X]_P; [Y]_Q), \end{equation} where $[X]_P$, resp. $[Y]_Q$ is a quantization of $X$, resp. $Y$ w.r.t. to finite partition $P$ of $X$, resp. $Q$ of $Y$. Now they say, that the definitions are equivalet for random variables with density and that it can be shown similarly to the way they show that the mutual information of continuous random variables is the limit of mutual information of their quantized version. For that they use a theorem stating that $H(X^{\Delta}) + \log\Delta \rightarrow h(X)$ for $\Delta \rightarrow 0$, where $\Delta$ is the length of a bin used for uniform quantization of $X$ and $X^{\Delta}$ is the corresponding quantized version of $X$. But since the general definition uses any partition, not necessarily the uniform one and without the uniformity, we do not have the limit of the quantization, I really cannot figure out the proof of equivalency. Can anyone help?