Channel capacity with transition $\Pi:=\begin{bmatrix} 1-p & p & 0 \\ 0 & 1-p & p \\ p & 0 & 1-p \end{bmatrix}$

42 Views Asked by At

We have a channel with transition matrix $\Pi$ defined above. Let's say it has an input $X=(x_1, x_2, x_3)$ and output $Y=(y_1, y_2, y_3)$ with the respective values $a_1, a_2, a_3$.

As I understand, the distribution of $Y$ should be

  • $P(Y = a_1) = (1-p)P(X = a_1) + pP(X = a_3)$
  • $P(Y = a_2) = (1-p)P(X = a_2) + pP(X = a_1)$
  • $P(Y = a_3) = (1-p)P(X = a_3) + pP(X = a_2)$

Furthermore, to determine the mutual information $I(X;Y) = H(Y) - H(Y|X)$, we have

$$H(Y) = \sum_{i=1}^3 p(1-p)\log_2\frac{1}{p(1-p)} = 3p(1-p)\log_2\frac{1}{p(1-p)}$$

So if we define $h_2(p):=p\log_2\frac{1}{p}+(1-p)\log_2\frac{1}{1-p}$, we eventually get

$$I(X;Y) = 3p(1-p)\log_2\frac{1}{p(1-p)} + h_2(p)$$

as a function of $p$.

The question is to find the input distribution that maximises $I(X;Y)$. I can find out numerically: its maximum is around $0.69$ at $p\approx 0.26$.

But the second question asks me to express the channel capacity as a function of $p$. Shouldn't this be simply $C=\max_{p\in[0,1]} I(X;Y)$?

1

There are 1 best solutions below

2
On BEST ANSWER

I don't think I understand your computation of $H(Y)$. Note that $p$ is a fixed parameter of the channel, but you somehow seem to be using it as an optimisation variable as well? It seems that you're mixing up the $p$ (which is a single scalar channel parameter) with the input distribution, i.e. the law of $X$, which is a probability vector on $3$ symbols.


Rather that writing $a_i$ let me just write $i$. Then we can write your first three equations succinctly as $$P(Y = i) = P(X = i) (1-p) + P(X = i-1) p,$$ where the addition $i-1$ is modulo $3,$ so that $1-1 = 3$.

Now, if we use $\pi = (\pi_1, \pi_2, \pi_3)$ to denote a law on $X$, then by using the above observation, and the definition of entropy, we get

$$ H(Y) = \sum_{i = 1}^3 P(Y = i) \log \frac{1}{P(Y = i)} = \sum_{i} [(1-p) \pi_i + p\pi_{i-1}] \log \frac{1}{(1-p)\pi_i + p\pi_{i-1}}.$$ There is no real simplification of the above in general.

$H(Y|X)$ is simpler to determine, since each input $i$ gives either $i$ or $i+1$ with probability $1-p$ or $p$, so for every $i$, $H(Y|X = i) = h_2(p)$, giving $H(Y|X) = \sum \pi_i H(Y|X = i) = h_2(p)$. Notice that this is invariant to the choice of $\pi$.

$I(X;Y)$ can be determined as the difference $H(Y) - H(Y|X) = H(Y) - h_2(p).$ Here notice that $H(Y)$ is a function of $\pi$, and $h_2(p)$ is unaffected by $\pi$. The capacity of the channel is $\max_{\pi} I(X;Y)$. Note that since $h_2(p)$ is not a function of $\pi$, the optimising $\pi$ above is the same as the one that maximises $H(Y)$.

This immediately gives us a suggestion for computing the capacity. Since all we need to do is figure out the $\pi$ that maximises $H(Y),$ let's think about what this maximum can be. Since $Y$ takes values on $3$ symbols, $H(Y) \le \log_2(3)$, with equality only if the law of $Y$ is uniform. Can we choose a $\pi$ that indeed makes the output of $Y$ uniform? This is possible if and only if there exists a $\pi$ such that $$ (1-p) \pi_1 + p \pi_3 = (1-p)\pi_2 + p \pi_1 = (1-p)\pi_3 + p \pi_2. $$ But of course, such a $\pi$ is easy to nail down, namely $\pi_i = 1/3$ for each $i$. The capacity of the channel is thus $$ \max_\pi I(X;Y) = \log_2(3) - h_2(p).$$


Of course, the above calculation is a simplification in this particular case, which need not always hold. In full generality, you would express $I(X;Y)$ as a function of $\pi$ when $X \sim \pi$ - call this $J(\pi).$ Note that $J$ is a concave function of $\pi$ (it's worth showing this). Then you need to solve the convex program $\max_\pi I(X;Y) = \max_\pi J(\pi)$ by setting up the KKT conditions as is standard. It would be a worthwhile exercise to do this for this question, just to get practice.