Problem: You need to send messages in the five symbols A, B, C, D, E. The frequencies of each symbol are in this order from most frequent to least frequent. Devise a code that will make the average length of messages as short as possible.
I know that A should have the shortest code, B the next shortest and so on with E having the longest code.
But I don't know what other constraints there are. For example, how should I make it so that the code can be decodable?
In the same chapter, an example for encoding the genetic code was given, with the assignment as follows:
A 0
C 10
T 110
G 111
(order of frequency is A, C, T, G for genes) Is there a way to turn this into a 5-symbol code?
You need to use what is known as a prefix code, meaning that no code is a prefix for another code. For example, $\{1,61,766,7716\}$ is a valid prefix code because no single code is a prefix to another. An example of an invalid prefix code is $\{1,61,617,1776\}$, because $1$ is a prefix to $1776$, and $61$ is a prefix to $617$.
One simple solution to your problem is to do the following:
$$A\space{0}$$ $$B\space{10}$$ $$C\space{110}$$ $$D\space{1110}$$ $$E\space{11110}$$
This is a valid prefix code. The more frequently a character is used, the less bits it uses. But, if you wanted to devise a more precise code (for example, A has a frequency "score" of $8$, B has a "score" of $6$, C has a "score" of $5$, etc.) a Huffman code can be built using a binary tree.