Binary symmetric channel: Why is $\sum p(x) H(Y \mid X = x) = \sum p(x) H(p)$?

417 Views Asked by At

I'm having this example for a simple binary symmetric channel (BSC) to bound the mutual information of $X$ and $Y$ as

\begin{align*} I(X;Y) &= H(Y) - H(Y|X)\\ &= H(Y) - \sum p(x) H(Y \mid X = x) \\ &= H(Y) - \sum p(x) H(p) \\ &= H(Y) - H(p) \\ &\leq 1 - H(p) \end{align*}

However, as the title states, I don't really understand why I can write

\begin{align*} \sum p(x) H(Y \mid X = x) = \sum p(x) H(p) \end{align*}

I know that

\begin{align*} \mathbb{P}[Y = 0 \mid X = 0 ] &= 1 - p \\ \mathbb{P}[Y = 1 \mid X = 0 ] &= p \\ \mathbb{P}[Y = 1 \mid X = 1 ] &= p \\ \mathbb{P}[Y = 0 \mid X = 1 ] &= 1 - p \end{align*}

but let's assume I set $p = \frac{1}{3}$, would that mean that I have

\begin{align*} I(X;Y) \leq 1- H(p) = 1- H(\frac{1}{3}) \approx 0.4716 \text{ bit} \end{align*}

I ask because if this is the case, why is it not

\begin{align*} I(X;Y) \leq 1- H(1-p) = 1- H(\frac{2}{3}) \approx 0.61 \text{ bit} \end{align*}

instead?

Or, and this would make the most sense to me, it's actually $p = (p_{error}, 1-p_{error})= (\frac{1}{3}, \frac{2}{3})$ and thus we have

\begin{align*} I(X;Y) \leq 1- H(p) = 1- H(\frac{1}{3}, \frac{2}{3}) \approx 0.0817 \text{ bit} \end{align*}

3

There are 3 best solutions below

1
On BEST ANSWER

The reason for the validity of the equation

\begin{equation} \sum p(x) H(Y \mid X = x) = \sum p(x) H(p) \end{equation}

can perhaps be better seen if we denote the right-hand side by

\begin{equation} \sum p(x) H_b(p) \end{equation}

where $H_b(\cdot)$ is the binary entropy function (https://en.wikipedia.org/wiki/Binary_entropy_function). To see this, note that the defining property of the BSC is precisely that independent of what the source symbol X is, an error, that is a bit-flip, occurs with a fixed probability $p$. In other words:

\begin{equation} \forall x: H(Y \mid X = x) = H(Err \mid X = x) = H_b(p) \end{equation}

where the first equality is due to the fact that for a binary input the enropy of the "error" is equal to entropy of $Y$ and the second equality follows by the paragraph above.

1
On

It seems to me that you are confusing $H(X) = - \sum_i Pr(x_i) \log ( Pr(x_i) )$ (the entropy) and $H(p) = -p \log_2(p) - (1-p) \log_2(1-p)$, which is called the binary entropy function.

The difference is whether the input is a random variable or a probability.

Note that $H(\frac 1 3) = H(\frac 2 3)$

0
On

I just realized that it's actually very simple to show that

\begin{align*} \sum_{x \in X} \mathbb{P}[X = x]H(Y|X=x) &= H_B(p) \end{align*}

We start by observing that of course

\begin{align*} \sum_{x \in X} \mathbb{P}[X = x]H(Y|X=x) &= p_X(0) \cdot H(Y|X=0) + p_X(1) \cdot H(Y|X = 1) \\ &= \sum_{x \in X} p_X(x) \cdot \Big( - \sum_{y \in Y} p_{Y|X}(y|x) \log_2 p_{Y|X}(y|x) \Big) \end{align*}

and since

\begin{align*} \mathbb{P}[X = 0] H(Y|X=0) &= p_X(0) \cdot \Big( -p_{Y|X}(0|0) \log_2 p_{Y|X}(0|0) - p_{Y|X}(1|0) \log_2 p_{Y|X}(1|0) \Big) \\ &= p_X(0) \cdot \underbrace{\Big(- (1-p)\log_2 (1-p) - p \log_2p \Big)}_\text{H((1-p),p)} \\ &=p_X(0) \cdot H((1-p),p) \\ &=p_X(0) \cdot H_B(p) \\ \mathbb{P}[X = 1] H(Y|X=1) &= p_X(1) \cdot H_B(p) \end{align*}

we have

\begin{align*} \sum_{x \in X} p_X(x)\sum_{y\in Y} p_{Y|X}(y|x) \log_2 p_{Y|X}(y|x) &= \mathbb{P}[X = 0] H(Y|X=0) + \mathbb{P}[X = 1] H(Y|X=1) \\ &= p_X(0) \cdot H_B(p) + p_X(1) \cdot H_B(p) \\ &= H_B(p) \end{align*}