Strange syntax in information theory

43 Views Asked by At

I recently started reading some information theory texts and was immediately struck by the strangeness of the syntactic choices for some basic concepts. For example:

  • KL-divergence is written as $D(P||Q)$ (roughly; I can't even figure out how to write the double bars appropriately in TeX) instead of $D(P, Q)$
  • Mutual information is often written as $I(X; Y)$ instead of $I(X, Y)$

There may be more examples that I haven't encountered yet. I'm curious if anyone has (historical) insight into why this came to be.

1

There are 1 best solutions below

0
On

Regarding the mutual information, there are cases when one is interested in the mutual information between sets of random variables, say, between $\{X_1, X_2\}$ and $\{Y_1, Y_2,Y_3\}$. In that case, I guess the notation $I(X_1,X_2;Y_1,Y_2,Y_3)$ is more convenient than $I(\{X_1, X_2\} , \{Y_1, Y_2,Y_3\})$

P.S.: you can write $\|$ in Tex for double bars.