The following is an excerpt from CLRS's Introduction To Algorithms:
Although we don't prove it here, a prefix code can always achieve the optimal data compression among any character code, and so we suffer no loss of generality by restricting our attention to prefix codes.
Please answer the question by providing or referring to a valid proof of the above statement.
The optimal coding referred to here is called Huffman coding. Huffman gave an algorithm which produces a prefix code which is optimal among character codes:
The Wikipedia article has complete details.
Huffman's original paper, " Method for the Construction of Minimum-Redundancy Codes", contains a proof of optimality.