The source coding theorem says that information transfer with variable length code uses less bits and is equal to the entropy of the distribution. It also says that there is no code that uses lesser number of bits ( or does it? Have I misunderstood this). My question is, there is obviously a way to transmit the same information using lesser number of bits than the entropy. For example:
Let
$P(x) = \frac{1}{2}, \frac{1}{4}, \frac{1}{16}, \frac{1}{32}, \frac{1}{64},\frac{1}{64}$ respectively for x= 1,2,3,4,5,6.
If by using source coding theorem, a variable code of length = $log_2(p(x))$ is used, then
the average bits used is $1*\frac{1}{2} + 2*\frac{1}{4} + 4*\frac{1}{16} + 5*\frac{1}{32} + 6*\frac{1}{64} + 6*\frac{1}{64}$ = 1.5938.
But obviously, there is a better way of doing this:
Just use the following codes: 0,1,10,11,100,101.
This definitely uses less bits on average ( bits $\lt 1.5938$). So what is the actual meaning of the source coding theorem?
The source coding theorem restricts to uniquely decodable codes, which in practice are the only useful codes.
To see that your code is not uniquely decodable, and hence practically useless, imagine that you receive the sequence $01$: you cannot know it that corresponds to a concatenation of $0|1$ (your two first symbols), or $01$ (the third symbol).
Usually, one restricts even more the set of acceptable codes, by restricting to prefix codes. It can be shown that doing so one loses nothing (in terms of average length), and prefix codes are more practical to decode and more easy to analyze.