How to show this equivalence between two definitions of channel capacity

63 Views Asked by At

Let us consider a random variable $X$ on alphabet $\mathcal{X}$ with probability distribution $p_X$ and a conditional probability distribution (channel) $W_{Y|X}$ that generates a random variable $Y$ on alphabet $\mathcal{Y}$. Let the distribution

$$q_Y(y) = \sum_{x\in\mathcal{X}}p_X(x)W_{Y|X}(y|x)$$

The capacity of a channel is given by

$$C(W) = \max_{p_X} I(X:Y)_p = \max_{p_X}\sum_{x, y \in\mathcal{X, Y}}p_X(x)W_{Y|X}(y|x)\log\frac{W_{Y|X}(y|x)}{q_Y(y)}$$

I have seen an alternate version of channel capacity given by

$$C(W) = \min_{q_Y}\max_{x\in\mathcal{X}} D(W_{Y|X}(\cdot|x)\|q_Y) = \min_{q_Y}\max_{x\in\mathcal{X}} \sum_{y\in\mathcal{Y}} W_{Y|X}(y|x)\log\frac{W_{Y|X}(y|x)}{q_Y(y)}$$

I don't really see how these two expressions are equal and would appreciate any pointers that lead in the right direction

1

There are 1 best solutions below

0
On BEST ANSWER

To avoid confusion ($q_Y$ appeared twice in the post, while with different usages), I will use $P_X$ and $P_Y$ to denote the input and output distributions, respectively, and they are related through $$ P_{Y}(y) = \sum_{x \in \mathcal{X}} W_{Y|X}(y|x) P_X(x).\tag{*}$$ Then, the first definition can be written as $$C(W) = \max_{P_X} D(P_{X}W_{Y|X}\| P_X P_Y),$$ where $P_Y$ is related to $P_X$ through (*).

The second definition can be rewritten as $$ C(W) = \min_{Q_Y} \max_{x\in \mathcal{X}} D(W_{Y|X = x} \| Q_Y). \tag{#} $$ Note that since the maximum is taken over a collection of non-negative terms, we can rewrite $$ \max_{x\in \mathcal{X}} D(W_{Y|X = x} \| Q_Y) = \max_{P_X} \sum_{x \in \mathcal{X}} P_X(x) D(W_{Y|X = x} \| Q_Y) = \max_{P_X} D(P_X W_{Y|X} \| P_X Q_Y). $$ As a result, RHS of (#) can be rewritten as

\begin{align*} \min_{Q_Y} \max_{x\in \mathcal{X}} D(W_{Y|X = x} \| Q_Y) &= \min_{Q_Y} \max_{P_X} D(P_X W_{Y|X} \| P_X Q_Y)\\ &\overset{(\rm a)}{=} \max_{P_X} \min_{Q_Y} D(P_X W_{Y|X} \| P_X Q_Y)\\ &\overset{(\rm b)}{=} \max_{P_X} D(P_X W_{Y|X} \| P_X P_Y), \end{align*} which coincides with the first definition.

Here,

  • $P_Y$ in the last equation is also the output distribution associated with $P_X$ defined by (*)

  • step (a) comes from minimax theorem, (you may check the conditions apply here)

  • step (b) is due to the fact that, for any distribution $P_{X, Y}$ with marginal distributions $P_X$, $P_Y$, $$D(P_{X, Y}\| P_{X}Q_{Y}) - D(P_{X, Y}\|P_X P_Y) = D(P_Y \| Q_Y) \geq 0.$$ Therefore, $$\min_{Q_Y} D(P_{X, Y}\| P_{X}Q_{Y}) = D(P_{X, Y}\|P_X P_Y).$$