Consider the random variable $X$ taking values from the alphabet $A_X=\{S_1,S_2\}$ with equal probability, $P(X=S_1)=\frac{1}{2}=P(X=S_2)$.
Now, let us take $X^N=(X_1,X_2,...X_N)$, which is a string of $N$ $iid$ random variables from $X$.
There are $2^N$ possible outcomes, each with probability $\frac{1}{2^N}$.
If we assign a binary name to each outcome as a codeword, each codeword will have a length of
$N$ bits, so the average codelength will also be $N$ bits and it will be equal to the entropy of
$X^N$, since $H(X^N)=N H(X)=N$.
Here comes my naive question:
Let us assign a binary name to each outcome except one and for the excepted one we assign a codeword of $M$ bits, where ${M}\lt{N}$ (for example, the codeword $111...11$ with $M$ $1's$).
Then, the average codelength is slightly less that $N$ bits, violating the entropy which is supposed to be the lower bound.
Such a code is not a prefix code, but it is uniquely decodable since we can program the decoder to recognize it by its length. That is, as long as we encode and decode blocks of $N$ symbols.
I am pretty sure that the mistake in my thinking lies on the previous sentence, but I cannot really pinpoint it. Any help?
Thank you very much.
2026-03-26 04:50:44.1774500644
Trivial question on Information Theory
94 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
I'm not an expert, but here's an idea. Information definitely hasn't been lost in this problem and actually you are definitely doing worse. Including an encoding based on length might make you think that you actually reduce the codelength but that is actually false. That is because to your original strings, you would have to append a bit of information to indicate whether the entry has length N or M. That means that in the end you would encode the information on an average codelength of
$$\bar{L}=\frac{(2^{N}-1)(N+1)+M+1}{2^N}=N+1-\frac{(N-M)}{2^N}>N$$
which can be shown to be trivially true since for $N>1$ since:
$$2^N>N>N-M$$