I'm having trouble understanding what the right way to think about the presentation of discrete channels and codes in text by Cover and Thomas.
A discrete channel is a system consisting of an alphabet $\mathcal{X}$ and output alphabet $\mathcal{Y}$ and a probability transition matrix $p(y|x)$ expressing the probability of observing output symbol $y$ given that we send symbol $x.$
How do we typically think of the $\mathcal{X}$ and $\mathcal{Y}$ alphabets here? Are we mapping individual symbols from $\mathcal{X}$ to individual symbols of $\mathcal{Y}$? In this case, both alphabets should have the same cardinality? I don't see how we could take $\mathcal{X} = \{a,b,\ldots, z\}$ and $\mathcal{Y} = \{0,1\}$, for example.
Next, after introducing channel extensions, they define:
An $(M,n)$ code for the channel $(\mathcal{X}, p(y|x), \mathcal{Y})$ consists of
- An index set $\{1,\ldots, M\}$
- Encoding function $X^n:\{1,\ldots, M\} \rightarrow \mathcal{X}^n,$ yielding codewords $x^n(1), x^n(2), \ldots, x^n(M).$ The set of codewords is called the code-book.
- Decoding function $g:\mathcal{Y}^n \rightarrow \{1,\ldots, M\},$ deterministic rule assigning guess to each possible received vector.
So we are taking some arbitrary "message" from the index set, encoding it using $n$ symbols from $\mathcal{X}$, sending those $n$ symbols over the channel, receiving a noisy version of $n$ symbols in $\mathcal{Y}$, and then trying to recover the original message based on those $n$ symbols?
Why do the codewords all have to have length $n$? What's the advantage of letting $\mathcal{X}$ be different from $\mathcal{Y}$?
I think I understand the formal stuff going on here, but my intuition as to why it's interesting is a bit lacking. Can someone shed some light on this?
It's not strictly necessary to have $\mathcal{X}$ equal to $\mathcal{Y}$. They can also be different sets. The purpose of a channel based on $(M,n)$-codes is to map a word of symbols from an alphabet to another word of symbols of another alphabet of the same length $n$ (i.e. so blocks of symbols). However, in a channel code you can also have communication errors, that are modeled by a conditional probability table $p(y\mid x)$. The advantage of letting $\mathcal{X}$ be different from $\mathcal{Y}$ is the fact that the communication channel can be defined in order to switch the alphabet from $\mathcal{X}$ to $\mathcal{Y}$, but not necessarily.