I have the following encodings: A=0 , B=10, C=11 Their probabilities are: $P(A)= 1/2 , P(B)= 1/3 , P(C)=1/6 $ I calculated the average length (in bits) per symbol of this encoding by doing the following: $$1/2 * 1 + 1/3 * 2 + 1/6 * 2 = 1.5 $$ The book is asking if it's possible to achieve $\sqrt{2} \approx 1.4$ I thought about it but changing the encoding by making it longer would make the average higher or changing the encoding by making it shorter would mix stuff together. For instance, if I want to decode the following I would get lost when changing it: $011100100$ which is now $acbaba$.
Is it possible or am I right by saying it's not? If it's not, how can I argue about it?
Given a finite probability distribution $p:=(p_i)_{1\leq i\leq n}$ its entropy is defined by $$H(p):=-\sum_{i=1}^n p_i \log_2(p_i)\ .$$ If $p$ models the frequencies of the letters of an alphabet then $H(p)$ turns out to be the average number of bits per letter. This is the essential content of Shannon theory, and cannot be explained in a few lines. In the case $p=\bigl({1\over2},{1\over3},{1\over6}\bigr)$ one obtains $H(p)=1.45915$. This is what you can reach "in the limit" through a clever encoding. But $1.41421$ is definitely not attainable under the given circumstances.