Why is the information rate of an error-correcting code defined by $\frac{\log_q\left(|\mathcal{C}|\right)}{n}$?

34 Views Asked by At

Let $\mathcal{C}$ be an error-correcting code of length $n$ and $q$ the number of distinct symbols of the alphabet. We define the information rate of $\mathcal{C}$ by $$\frac{\log_q\left(|\mathcal{C}|\right)}{n}.$$ I know it generalizes the concept which is connected to the time needed for the transmission of information, but why is it defined like that?

1

There are 1 best solutions below

3
On BEST ANSWER

You have $|\cal C|$ different codewords, each of length $n.$ The number of $q-$ary strings of length $\ell$ is $q^\ell,$ and therefore, you need the smallest length $\ell$ which satisfies $$ q^{\ell} \geq |\cal C|, $$ in order to represent the different message words, each one corresponding to a unique codeword. Take logs to base $q$ (strictly speaking there should be a ceiling function on $\log(|\cal C|)$ but let's ignore this) to get $$ \ell = \log_q(|\cal C|). $$ The transmitted codewords are length $n,$ so the ratio of the information bearing symbols to the total number of symbols transmitted is the rate of the code: $$ \frac{\log_q(|\cal C|)}{n}. $$

Edit: So, if you have a message $\ell$ symbols long, you can define the parity symbols as the remaining $n-\ell$ symbols by means of some set of equations (usually but not always linear). See, for example, Wikipedia