Proof of conditional entropy $H( X , Y | Z ) = H( X | Z ) + H( Y | X , Z )$

8.1k Views Asked by At

In information theory, the joint entropy $H(X,Y)$ of a pair of discrete random variables $(X,Y)$ is defined as:

$$ H(X,Y) = -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}p(x,y)log_2p(x,y)\tag{1}\label{eq1} $$

And the conditional entropy $H(Y \mid X)$ of the same $(X,Y)$ as:

$$ H(Y \mid X) = -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}p(x,y)log_2p(y\mid x)\tag{2}\label{eq2} $$

With the chain rule of probability, the theorem:

$$ H(X,Y) = H(X)+H(Y \mid X)\tag{3}\label{eq3} $$

is another interpretation of the joint entropy for $(X,Y)$.

A corollary of this theorem is: $$ H(X,Y \mid Z) = H(X \mid Z)+H(Y \mid X,Z) $$

Proof:

\begin{align} H(X,Y \mid Z) &= -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y,z)log_2p(x,y \mid z) \\ &= -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y,z)log_2p(y \mid z,x)p(x\mid z) \\ &= -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y, z)log_2p(x\mid z) -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y,z)log_2p(y \mid z,x)\\ &= -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(y\mid x,z)p(x,z)log_2p(x\mid z) -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y,z)log_2p(y \mid z,x)\\ &= -\sum_{x\in \mathcal X}\sum_{z\in \mathcal Z}p(x,z)log_2p(x\mid z) -\sum_{x\in \mathcal X}\sum_{y\in \mathcal Y}\sum_{z\in \mathcal Z}p(x,y,z)log_2p(y \mid z,x)\\ &= H(X \mid Z)+H(Y \mid X,Z) \end{align}

I would like to know if it is right.

2

There are 2 best solutions below

4
On BEST ANSWER

Without having looked at the details of your proof, I suggest the following alternative argument.

Instead of seeing $X$ and $Z$ as two distinct discrete random variables, we pair them up and interpret $A := (X,Z)$ as a single random variable. Clearly, all information that is contained in $X$ and $Z$ is also contained in $A$ and vice versa. Similarly, we pair up $X$ and $Y$ by defining $B : = (X,Y)$. We then get by applying rule $(3)$:

$$ H(X \vert Z) + H(Y \vert X,Z) =$$ $$ H(X \vert Z) + H(Y \vert A) =$$ $$ H(X,Z) -H(Z) + H(Y,A) - H(A) =$$ $$ H(X,Z) -H(Z) + H(Y,A) - H(X,Z) =$$ $$ -H(Z) + H(Y,A)=$$ $$ -H(Z) + H(Y,X,Z)=$$ $$ -H(Z) + H(B,Z)=$$ $$ H(B \vert Z) =$$ $$ H(X,Y \vert Z).$$

1
On

$H(X,Y) = -\sum\limits_{x\in \mathcal X}\sum\limits_{y\in \mathcal Y}p(x,y)\log_2p(x,y)$

$H(X,Y∣Z)=−\sum\limits_{x\in \mathcal X} \sum\limits_{y\in \mathcal Y}\sum\limits_{y\in \mathcal Z}p(x,y,z)\log_2p(x,y∣z)$

$H(X,Y∣Z)=−\sum\limits_{x\in \mathcal X} \sum\limits_{y\in \mathcal Y}\sum\limits_{y\in \mathcal Z}p(x,y,z)\log_2(\frac{p(x,y,z)}{p(x,z)})(\frac{p(x,z)}{p(z)})$

$=−\sum\limits_{x\in \mathcal X} \sum\limits_{y\in \mathcal Y}\sum\limits_{y\in \mathcal Z}p(x,y,z)\log_2p(y∣z,x)p(x∣z)$

$=−\sum\limits_{x\in \mathcal X} \sum\limits_{y\in \mathcal Y}\sum\limits_{y\in \mathcal Z}p(x,y,z)\log_2p(x∣z)−\sum\limits_{x\in \mathcal X} \sum\limits_{y\in \mathcal Y}\sum\limits_{y\in \mathcal Z}p(x,y,z)\log_2p(y∣x,z)$

$=−\sum\limits_{x\in \mathcal X} \sum\limits_{y\in \mathcal Y}\sum\limits_{y\in \mathcal Z}p(x,z)\log_2p(x∣z)−\sum\limits_{x\in \mathcal X} \sum\limits_{y\in \mathcal Y}\sum\limits_{y\in \mathcal Z}p(x,y,z)\log_2p(y∣x,z) \\= H(X∣Z)+H(Y∣X,Z)$