I'm reading Elements of Information Theory - 2nd edition (2006). In the book, at page 195, they give the formula:
The rate $R$ of an $(M, n)$ code is
$\hspace{5.0cm} R = \frac{\log M}{n}$ bits per transmission
The definition of an $(M, n)$ code is given as
An $(M, n)$ code for the channel $(\cal{X}, p(y|x), \cal{Y})$ consists of the following:
An index set $\{1, 2, ..., M\}$ (i.e. indices of possible messages)
An encoding function $X^n: \{1, 2, ..., M\} \to \cal{X}^n$, yielding codewords $x^n(1), ..., x^n(M)$. The set of codewords is called the code-book
A decoding function
$\hspace{5.0cm} g: {\cal{Y}}^n \to \{1, 2, ..., M\}$,
which is a deterministic rule that assigns a guess to each possible received vector
According to CMU documents, the messages $m$ passing to $X^n$ comes from a compressor, which compresses messages coming from source (i.e. source coding). Given this assumption, I cannot find any intuitive understanding about the code rate formula.
According to some other document, the message $m$ passing to $X^n$ comes from some information source (i.e. $m$ is the original message that hasn't been compressed yet). Given this assumption, I understand the formula $R = \frac{\log M}{n}$ as:
The information source use $\log M$ bits to represent each possible message $m$ within the set of possible messages $\{1, 2, ..., M\}$ (i.e. the naive way of representing symbols, used in ASCII)
Since each message from $\{1, 2, ..., M\}$ is represented using a string $S$ of $n$ symbols from $\cal{X}$
$\hspace{1.0cm} \rightarrow R = \frac{\log M}{n}$ can be understood as the average number of bits, in the original message, that each of the symbols in $S$ is responsible for representing.
Is my understanding correct ? If not, please give me some explanation of the formula $R = \frac{\log M}{n}$