Channels with memory have higher capacity

385 Views Asked by At

I am working through Elements of Information Theory by Cover and Thomas and have come across the following solution to one of their problems that I don't understand.

Consider a binary, symmetric channel with $Y_i = X_i + Z_i$, where $+$ is mod2 addition and $X_i, Y_i \in \{0,1\}$. Suppose that $\{Z_i\}$ has constant marginal probabilities $Pr\{Z_i=1\}=p=1-Pr\{Z_i=0\}$, but that $Z_1, Z_2, ..., Z_n$ are not necessarily independent. Assume that $Z^n$ is independent of the input $X^n$. Let $C=1-H(p,1-p)$. Show that $\max_{p(x_1, ..., x_n)}I(X_1,...,X_n;Y_1,...,Y_n) \ge nC$.

The solution begins by stating:

$I(X_1,...,X_n;Y_1,...,Y_n)=H(X_1,...,X_n)-H(X_1,...,X_n|Y_1,...,Y_n)=H(X_1,...,X_n)-H(Z_1,...,Z_n|Y_1,...,Y_n)$

where $X_i$ are chosen i.i.d. from Bernoulli($\frac{1}{2}$). I don't understand how the rightmost equality is derived. It's not an identity and is not explained. I'm assuming that it's something simple I'm overlooking, could someone offer a hint?