Conditional Entropy of Lossy Channel Output

113 Views Asked by At

I've seen the following claim in several sources, that all treat it as completely obvious, but am unable to prove it, probably due to lack of basic knowledge about probability and information theory, which at present I'm trying to learn on my own.

Claim:

Let $N\in\mathbb{N}$, $N>0$, and $X$ be a random variable in $\{0, 1\}^N$ (possibly further implicit assumptions are needed?). This random variable is transmitted through a lossy channel which has a probability $p_e$ of flipping any individual bit (the bits are treated as independent by this memoryless transmission). The result of the transmission is $Y$. Then the conditional (Shannon) entropy of $X$ given $Y$ is $$ \begin{equation} H(X|Y) = Nh(p_e) = N \left(-p_e \log(p_e) - (1-p_e)\log(1-p_e) \right), \end{equation} $$

where $h(p_e)$ is the Shannon entropy of a Bernoulli trial with parameter $p_e$.

My Attempt:

I tried to use the definition of conditional entropy: $$ \begin{equation} H(X|Y) = -\sum_{x,y\in\{0,1\}^N} p(x,y) \log\frac{p(x,y)}{p(y)}. \end{equation} $$ For $x,y\in\{0,1\}^N$ let $k$ be the Hamming distance between $x$ and $y$, i.e., the number of bits where $x$ and $y$ differ. Then $p(x,y)= p(x)\binom Nkp_e^k(1-p_e)^{N-k}$. However, if I try to plug this into the formula and somehow convert the sum into a sum over $k$ (which I'm not sure is correct in the first place), I get something like the entropy of a Binomial distribution, for which no simple formula is known. Clearly I'm doing something wrong. How would you prove the above claim (while possibly adding the "obvious" implicit assumptions)? Thanks in advance!

2

There are 2 best solutions below

3
On BEST ANSWER

Denoting $X=(X_1, X_2, \ldots, X_N)$ and similary for $Y$, note that, since the channel is memoryless

$$ \mathbb{P}(X=(x_1,x_2,\ldots,x_N)|Y=(y_1,y_2,\ldots,y_N))=\prod_{i=1}^N \mathbb{P}(X_i=x_i|Y_i=y_i). $$ From fundamental properties of the entropy, it follows that $$ \begin{align} H(X|Y)&=\sum_{i=1}^NH(X_i|Y_i)\\ &=N H(X_i|Y_i), \end{align} $$

You have probably seen the first equality as holding for unconditional entropy, you may want to work on showing why this also holds for conditional entropy. The second equality holds since the channel treats each bit the same. Now, you need to show that $H(X_i|Y_i)=h(p_e)$.

0
On

This summarizes the argument by Stelios (accepted answer) into a full proof.

Result: Let $N\in\mathbb{N}$, $N>0$ and $X$ be a random variable in $\{0,1\}^N$. Let $Y$ be the result of transmitting $X$ over a memoryless symmetric binary channel with error probability $p_e$. Then $$\begin{equation}H(X|Y) = N\end{equation}h(p_e)$$ where $h$ is the binary entropy function.

Proof. Note that $$H(X_i|Y_i = 0) = - \left(\mathbb{P}(X_i=0|Y_i=0)\log\mathbb{P}(X_i=0|Y_i=0) + \mathbb{P}(X_i=1|Y_i=0)\log\mathbb{P}(X_i=1|Y_i=0) \right).$$

By definition of the binary symmetric channel, $\mathbb{P}(X_i=0|Y_i=0) = 1 - p_e$ and $\mathbb{P}(X_i=1|Y_i=0) = p_e$, Hence $H(X_i|Y_i = 0) = h(p_e)$. By symmetry (or repeating the calculation), we find $H(X_i|Y_i = 1) = h(p_e).$ In summary, $$H(X_i|Y_i) = \sum_{y\in\{0, 1\}} p(Y_i=y) H(X_i|Y_i=y)=\sum_{y\in\{0, 1\}} p(Y_i=y) h(p_e) = h(p_e).$$

We turn to the sum property for conditional entropy. Since the channel is memoryless, $$ \mathbb{P}(X=(x_1,x_2,\ldots,x_N)|Y=(y_1,y_2,\ldots,y_N))=\prod_{i=1}^N \mathbb{P}(X_i=x_i|Y_i=y_i). $$ Consider the case of only four random variables $X_1, X_2, Y_1, Y_2$ such that $\mathbb{P}(X_1,X_2|Y_1,Y_2) = \mathbb{P}(X_1|Y_1)\mathbb{P}(X_2|Y_2)$. By (one) definition of conditional entropy $$H(X_1,X_2|Y_1,Y_2) = \sum_{y_1, y_2}\mathbb{P}(Y_1=y_1, Y_2 = y_2)H(X_1, X_2|Y_1=y_1, Y_2 = y_2).$$ The second factor in the sum is the entropy of $X_1, X_2$ conditioned on $Y_1=y_1$ and $Y_2=y_2$. In particular, it is the (standard Shannon) entropy of the conditional probability distribution $$(x_1,x_2)\mapsto \mathbb{P}(X_1=x_1, X_2=x_2|Y_1=y_1,Y_2=y_2) = \mathbb{P}(X_1=x_1|Y_1=y_1) \mathbb{P}(X_2=x_2|Y_2=y_2),$$ i.e. the joint entropy of the two independent probability distributions $\mathbb{P}(X_1|Y_1=y_1)$ and $\mathbb{P}(X_2|Y_2=y_2)$. Hence $$ \begin{align} &H(X_1,X_2|Y_1,Y_2) \\&= \sum_{y_1, y_2}\mathbb{P}(Y_1=y_1, Y_2 = y_2)H(X_1, X_2|Y_1=y_1, Y_2 = y_2) \\&= \sum_{y_1, y_2}\mathbb{P}(Y_1=y_1, Y_2 = y_2)\left(H(X_1|Y_1=y_1) + H(X_2|Y_2=y_2)\right) \\&= \sum_{y_1}\left(\sum_{y_2}\mathbb{P}(Y_1=y_1, Y_2=y_2)\right)H(X_1|Y_1=y_1) + \sum_{y_2}\left(\sum_{y_1}\mathbb{P}(Y_1=y_1, Y_2=y_2)\right)H(X_2|Y_2=y_2) \\&=^{*} \sum_{y_1}\mathbb{P}(Y_1=y_1)H(X_1|Y_1=y_1) + \sum_{y_2}\mathbb{P}(Y_2=y_2)H(X_2|Y_2=y_2) \\&= H(X_1|Y_1) + H(X_2|Y_2), \end{align}$$ where equality (*) follows from the definition of marginal distributions. This argument generalizes to $N$ pairs of random variables $(X_i, Y_i)$. Combining everything, where $X=(X_1,\dots, X_N)$ and $Y = (Y_1, \dots, Y_N)$, $$ \begin{equation} H(X|Y)=\sum_{i=1}^NH(X_i|Y_i)=N H(X_i|Y_i)=Nh(p_e). \end{equation} $$