In textbook discussions it is common to characterize Shannon entropy as giving (emphasis added)
I understand that this is intended to mean, more precisely, that it is average number of bits that will be used by the most efficient (in terms of number of bits) encoding, to encode the symbols in that distribution (or, equivalently, the expected average number of bits for symbols drawn for it).
But is true that the Shanon entropy is a lower bound?
For example, if I have 5 symbols with frequencies
$$\frac{1}{2},\frac{3}{16},\frac{1}{8},\frac{1}{8},\frac{1}{16}$$
the Shanon entropy is $1.95$ (by using 1, 2.4, 3, 3, and 4 bits respectively), but I can encode these symbols as
$$0, 1, 00, 10, 11$$
for an entropy of $1.31$; or (if for some reason the values must be distinct) as
$$0,1,10,11,100$$
for an entropy of $1.38$.
Is there some requirement that's placed on the encoding assumed in defining entropy that I'm violating in my examples? Is it true that Shannon encoding represents a lower bound on the number of bits needed?
The Shannon entropy represents a lower bound on the average number of bits needed to represent the information symbols without losing any information. In other words, the code should be uniquely decodable. The examples you gave do not constitute a uniquely-decodable code. For example, how do you decode 010? Is it 0, 1, 0 or 0, 10?
An example of a uniquely-decodable code is 0, 10, 110, 1110, and 1111. With this code, only one sequence represent an encoded sequence. For example, the encoded sequence 01110101111110 correspond to adbec, where a, b, c, d and e represent the uncoded symbols.