Shannon's channel coding theorem with bit error probability

611 Views Asked by At

On page 168 of MacKay's Information Theory, Inference, and Learning Algorithms (http://www.inference.org.uk/mackay/itila/book.html), which is talking about communication (with errors) above capacity, he claims that an encoding scheme (let's call this "auxiliary") for the binary symmetric channel with transition probability $q$ can be reversed so that the decoder becomes a lossy compressor. Specifically, from the perspective of the auxiliary encoding scheme, "the optimal decoder of the code... typically maps a received vector of length $N'$ to a transmitted vector differing in $qN'$ bits from the received vector".

Capacity-achieving encoding scheme ($\text{rate} = \frac{K}{N}\approx C$): Encoder_1 -> Noisy channel -> Decoder_1

Auxiliary encoding scheme ($\text{rate} = \frac{K}{N'}\approx 1-H_2(q)$): Encoder_2 -> BSC -> Decoder_2

Patched encoding scheme ($\text{rate}\approx \frac{C}{1-H_2(q)}$): Decoder_2 -> Encoder_1 -> Noisy channel -> Decoder_1 -> Encoder_2

However, from the perspective of Decoder_2 patched on to the noisy channel, it needs to handle all kinds of $N'$-bit inputs, not just typical ones. My question then, is how do we prove that the bit error probability of this final patched scheme is less than $q$ as required?

1

There are 1 best solutions below

2
On

Your question essentially is the following (just to help other readers not familiar with McKay's textbook and/or the context of the question):

Consider an encoder that maps source codewords $x\in \{0,1\}^K$ to channel codewords $y \in \{0,1\}^{N'}$ (with $K/N' < 1$), and its corresponding decoder, which maps a codeword $r \in \{0,1\}^{N'}$ to a source codeword $\hat{x}\in \{0,1\}^K$. Assume that this coding scheme is able to correctly decode codewords at the output of a BSC with transition probability $q$ (therefore, by the Shannon theorem it must hold $K/N'\leq 1-H(q)$). Now consider the "reverse" operation:

  • An arbitrary (uniformly distributed) codeword $r \in \{0,1\}^{N'}$ is input to the decoder, which, by definition, outputs a codeword $\hat{x}\in \{0,1\}^K$.
  • The codeword $\hat{x}$ is then input to the encoder, which, by definition, will output a codeword $y \in \{0,1\}^{N'}$

What is the (error) probability that a certain bit of $y$ differs from the corresponding one of $r$?

First, lets consider in detail how the decoder operates when the encoder/decoder are used "as normal". As stated in the same section of McKay's book, when the maximum possible rate $K/N'=1-H(q)$ is employed, the received (noisy) codeword $r$ that is input to the decoder is uniformly distributed over $\{0,1\}^{N'}$ (see edit below). Now, the decoder takes the received codeword $r$ and essentially determines the most probable (typical) noise sequence $n \in \{0,1\}^{N'}$ for which $r=\hat{y}\oplus n$, where $\hat{y} \in \{0,1\}^{N'}$ is a valid codeword, and returns the estimate $\hat{x}$ as the unique source codeword corresponding to $\hat{y}$. Note that, for sufficiently large $N'$, the noise codeword $n$ will "typically" consist of $qN'$ ones and $(1-q)N'$ zeros, i.e., $\hat{y}$ and $r$ "typically" differ at $qN'$ positions (bits).

Now, consider the "reverse operation", again with $K/N'=1-H(q)$. A uniformly distributed codeword $r$ is provided at the input of the decoder. Note that this codeword is distributed exactly the same as the codewords at the input of the decoder under "normal" operation, i.e., $r$ is with high probability equal to the sum of a (valid) codeword plus a (typical) noise sequence. The decoder will generate, as described above, a codeword $\hat{x}$ which will have a one-to-one correspondence to a valid codeword $\hat{y}$. When the codeword $\hat{x}$ is provided at the input of the encoder, the encoder will generate the one-to-one map to the codeword $y=\hat{y}$, which, as stated above, differs from $r$ at "typically" $qN'$ positions. This implies that the error probability is $q$.

P.S.: The above arguments (as well as McKay's) are informal and non-rigorous. A rigorous approach requires consideration of rate-distortion theory aspects.

edit: the uniform distribution of the output of the channel follows since the capacity of the binary symmetric channel is achieved when its output is uniformly distributed and a capacity achieving code is considered.