Why does the common information $C\equiv X\wedge Y$ give $H(X)=H(CX)$?

52 Views Asked by At

Given two random variables $X,Y$, their common information $X\wedge Y$ is defined in (Wolf and Wultschleger 2004) as the random variable $X\wedge Y=f_X(X)=f_Y(Y)$ constructed as follows:

  1. Let $G$ be the graph whose set of vertices is the union of the possible events for $X$ and $Y$, which we write as $\mathcal X\cup\mathcal Y$. Let's say that any two vertices $x\in\mathcal X$ and $y\in\mathcal Y$ of this graph $G$ are connected by an edge if $P_{XY}(x,y)>0$.
  2. Let $f_X:\mathcal X\to 2^{\mathcal X\cup\mathcal Y}$ be the function sending each vertex $x\in\mathcal X$ to the connected component of $G$ containing $x$. In words, this sends each input event into the full set of vertices of $G$ which are (not necessarily directly) connected to $x$. Same for $f_Y:\mathcal Y\to 2^{\mathcal X\cup\mathcal Y}$.

This random variable is used to discuss zero-error transmission rates and other things.

The authors go on to show that we have the identity $$I(X:Y) = H(X\wedge Y) + I(X:Y|X\wedge Y),$$ and to prove this they use the simpler identity $H(X)=H(XC)$, with $C\equiv X\wedge Y$. I'm trying to get a better understanding of why $H(X)=H(XC)$ holds. It seems like a simple enough statement, but I'm not seeing how it's proved, probably because I haven't really fully understood the intuition behind the above definition of common information.

2

There are 2 best solutions below

0
On BEST ANSWER

It's because $C$ is function of $f$, and we always have $H(X, f(X)) = H(X)$: $$H(X, f(X)) = \\ - \sum_x \sum_t P(X = x, f(X) = t) \log P(X = x, f(X) = t) =\\ -\sum_x P(X = x, f(X) = f(x)) \log P(X = x, f(X) = f(x)) = \\ -\sum_x P(X = x) \log P(X = x) = H(X)$$

The second equality is because $P(X = x, f(X) = t) = 0$ if $t \neq f(x)$, and the third is because $P(X = x, f(X) = f(x)) = P(X = x)$.

Intuitively, if we add more states (now we have value not only of $X$, but also of $f(X)$), entropy can't decrease, but we also can't get entropy from nowhere by adding more description of data we already had.

0
On

I'll complement the other answer with some additional explanation for the general definition of common information, which helped me get a better idea of the proof at issue, among other things. Suppose we have four possible inputs and four possible outcomes, with labels $\{1,2,3,4\}$ for both, and a joint probability distribution $$p(1,1)= p(1,2) = \frac18, \quad p(2,3) = \frac14, \quad p(3,3) = p(3,4) = \frac18, \quad p(4,4) = \frac14.$$ We can schematically represent this distribution as

where vertices on the left (right) correspond to possible values of $X$ ($Y$), and connections to the nonzero elements of the joint probability distribution. The colors correspond to the two possible value of the random variable $C$, discussed below.

Then:

  1. The corresponding graph $G$ has precisely two connected components: $C_1\equiv \{1_i,1_o,2_o\}$, and $C_2\equiv\{2_i,3_i,4_i,3_o, 4_o\}$, where the subscripts $i,o$ are used to specify whether we're referring to a vertex that comes from an "input" or an "output" event.

  2. The random variable $X\wedge Y$ has two possible events, corresponding to the two connected components specified above. The associated probability distribution assigns to $\{1_i,1_o,2_o\}$ the probability $\frac14$, and to $\{2_i,3_i,4_i,3_o, 4_o\}$ the probability $\frac34$.

  3. Knowing $X$ clearly also determines $X\wedge Y$ (the latter is writable as a function of the former, after all). Thus $H(X)=H(XC)$. And similarly, $H(Y)=H(YC)$.

  4. The "zero-error information" is $H(X\wedge Y) = H(1/4,3/4)=2-\frac34\log_23$. To compute the mutual information conditional to the common information, we need to compute the mutual information conditioned on each connected component. Conditionally to $C_1$ the mutual information is $I(X:Y|C_1)=0$, because there's only a single possible input value, while conditioned to $C_2$ we have $I(X:Y|C_2)=\frac23$, and thus $I(X:Y|X\wedge Y) = \frac34 I(X:Y|C_2)=\frac12$.

  5. The mutual information equals $I(X:Y)=\frac52- \frac34\log_2 3$, consistently with $$I(X:Y) = I(X:Y|X\wedge Y) + H(X\wedge Y).$$

Looking back at the figure above, the intuition is clear: the only way to send information error-free is to encode in the input using different "colours". That is, encoding "0" as $X=1$ and "1" as any of $X=2,3,4$. The common information $C$ has two possible values corresponding to the two different colors (in turn representing the different connected components). Conditioned to $C$, there is no way to use the top connected component (purple region) to send further information, and thus $I(X:Y|C_1)=0$. On the other hand, the bottom connected component (green region) still allows to send information, albeit not error-free, because if $X=2$ or $X=4$ then information is transmitted reliably, but whenever $X=3$ information is lost.