Information theory - Intuition of channel capacity

400 Views Asked by At

Question

As stated in Elements of Information theory, given $p(y|x)$, the Information channel capacity formula is

$C = \max_{p(x)} I(X; Y)$

where $X, Y$ are input and output symbols, $p(x)$ is the p.m.f of the distribution of $X$

Can anyone tell me the intuition of this formula ?

My understanding

$C = \max_{p(x)} I(X; Y) = \max_{p(x)} [H(X) - H(X|Y)]$

$H(X)$ can be understood as the expected amount of unknown information (measured in bits) of $X$. $H(X|Y)$ is the expected the expected amount of remaining unknown information (measured in bits) of $X$ given $Y$ is known. Thus, $I(X; Y)$ is the expected amount of known information (measured in bits) about $X$ given by $Y$.

Thus, $C =\max_{p(x)} I(X; Y)$ gives the maximum number of bits in $X$, which can be exactly given by $Y$.

There are two problems with my understanding:

  • According to my understanding, for any input symbol $X$, there are only $I(X; Y)$ bits can be transmitted without error (i.e. no matter whether $H(X)$ exceeds the channel capacity $C$ or not)

  • In Elements of Information theory, $C$ is usually written as $C = \max_{p(x)}[H(Y) - H(Y|X)]$, instead of $\max_{p(X)}[H(X) - H(X|Y)]$ (i.e. my formula). I think $H(Y) - H(Y|X)$ says something about the right intuition of the formula $C = \max_{p(x)} I(X; Y)$.