Do formulas of Shannon information remain valid by a conditioning operation?

80 Views Asked by At

Do equality remain valid after the operation of adding conditioning events to a linear Shannon equation. If so, how do you prove this?

Let $X,Y,Z$ be random variables. Let us tentatively call (conditional) entropy $H$, and mutual information $I$ Shannon measures. We consider formulas such that the sum of Shannon measures is zero. Let us call this kind of equation a linear Shannon equation. The operation called adding conditioning events is to add a random variable as the given event to a Shannon measure.

For example, because the following equality has the form such that the sum of Shannon measures is zero, we can say this is a linear Shannon equation. $$H(X,Y) - H(X)-H(Y)+I(X;Y)=0.$$ We can perform the operation of adding conditioning events by adding $Z$ as the given event, and then the above equation becomes $$H(X,Y\mid Z) - H(X\mid Z)-H(Y\mid Z)+I(X;Y\mid Z)=0.$$ Clearly, this equation also holds by the definitions.

My question is whether the operation called adding conditioning events always remain the validity of linear Shannon equation. For the above example, the answer is YES. How about in general?

If the equality maintain in general, I would like to know a proof or a reference to a known-proof.

My opinion -- I believe the equality maintains in general, and can be demonstrated by using the I-measure [1], which asserts that Shannon measure can be regarded as a signed measure. Then the conditioning operation is regarded as relative complement, and our problem can be resolved by easy algebra in the set theory. However, I also believe that there is a (well-known?) simple proof without I-measure.

[1] See ch.6 in Yeung, "First course in information theory", Springer, 2021. (http://iest2.ie.cuhk.edu.hk/~whyeung/post/draft7.pdf)

1

There are 1 best solutions below

3
On BEST ANSWER

You need to differentiate between the case where the linear Shannon equation only holds for particular random variables or for all random variables.

1.) Let's assume the linear Shannon equation holds only for particular random variables $X,Y$. Then your statement is incorrect, due to the following counterexample. Assume $X,Y$ are independent random variables with the same positive entropy, i.e., your Shannon equations reads as $H(X)-H(Y) = 0$. Conditioning on $Z=X$ gives $H(X|X) - H(Y|X) = 0$ and thus $H(Y|X)=H(Y) = 0$, which is a contradiction.

2.) Let's assume your statement is correct for all random variables $X,Y$, as in your example. In this case, denote $X_z$ as the random variable $X$ conditioned on $Z=z$ and $Y_z$ analogously (which are again valid random variables). Then, your statement also holds for the random variables $X_z$ and $Y_z$ (as the equation is true for all RVs by assumption). Multiplying the equation with $P(Z=z)$ and summing over all $z$ gives your desired equality. This is since, for example, $$ \sum_z P(Z=z) H(X_z) = \sum_z P(Z=z) H(X|Z=z) = H(X|Z). $$ The analogous statement is also true for conditional entropy and mutual information. So you can introduce conditioning in the case where the statement is correct for all random variables $X,Y$.

Note on conditioning on RVs:

TLDR: Conditioning RVs on RVs is "not quite sound" [1], but you can condition RVs on events [2]. Entropy can be conditioned on events and RVs.

It is important to differentiate between conditioning on the RV $Z$ or the event $Z=z$. In the above exposition we are first conditioning the random variables $X$ and $Y$ on $Z=z$, which is in fact the condition on an event (and not the condition on a random variable). In this regard, $X_z$ (or $X|Z=z$) describes a random variable, that has a density function $p(X=x|Z=z)$ and for each given $z$, we get a new random variable $X_z$. The conditional entropy (on the event $Z=z$) is then defined as the entropy of the conditional RV $X_z$, i.e., $$H(X|Z=z) = -\sum_x P(X=x|Z=z) \log P(X=x|Z=z),$$ which will be a function of $z$ (for each value of $z$, you get an entropy value). In this regard, $H(X|Z)$ is then defined as the average conditional entropy. $$H(X|Z) = \sum_z P(Z=z) H(X|Z=z),$$ which is now simply a number. The naturality of this definition is, as pointed out in [3, Sect. 2.2], is due to the chain rule of entropy, which states that the joint entropy is given by $H(X,Z) = H(Z) + H(X|Z)$.

[1] How to formalize "conditional random variables"
[2] What is the definition of $X|(Y=y)$?
[3] Cover, Thomas - "Elements of Information Theory", 2006