Understanding notation difference between mutual information and information divergance

798 Views Asked by Bumbble Comm At 06 Apr 2026 - 11:41

The mutual information is defined on random variables. That is, $I(X;Y)$ denotes the mutual information between random variables $X$ and $Y$.

On the other hand, the the Kullback-Leibler divergence is defined on probability distributions; i.e., $D_{\mathrm{KL}}(P\|Q)$ denotes the K-L divergence of the probability distribution $P$ with respect to the probability distribution $Q$.

Why aren't $I$ and $D_{\mathrm{KL}}$ both defined on random variables (or both defined on probability distributions)?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 20 Aug 2013 - 2:08 BEST ANSWER

I think it is actually natural to use different notation so in this case. Mutual information is defined on the joint of $X$ and $Y$, and if written as a function of the joint as $I(p(X,Y))$, there could be confusion in higher dimensions. Suppose $X = [x_1, \ldots, x_n]$ and $Y = [x_{n+1}, \ldots, x_{n+m}]$. Then writing $I(p(x_1, \ldots, x_{n+m}))$ would be ambiguous without specifying which variables form $X$ and which form $Y$.

On the other hand, Kullback-Leibler divergence is defined on two marginal distributions of $X$ and $Y$, and the joint must be ignored (or assumed independent). Hence writing $D_{KL}(X,Y)$ would be confusing if they are indeed dependent. In this case writing the marginal distributions directly makes more sense.

And of course, you know that mutual information is just the Kullback-Leibler divergence between the product of marginals and the joint distribution.

Bumbble Comm On 19 Aug 2013 - 11:36

Both Mutual Information and Kullback-Leibler Divergence involve the probability mass or density functions of two random variables, depending on whether they are discrete or continuous r.v.'s. Namely, they are both defined for both kinds of r.v.'s. Both are expected values (of a different quantity of course). Both measures need random variables and their pmf or pdf to exist. In other words, the answer to your question is: Because even scientists cannot be as exact and internally consistent (at least in terminology) as science demands.

Understanding notation difference between mutual information and information divergance

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in NOTATION

Related Questions in INFORMATION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions