I'm reading the tutorial of Information Gain, and I see the following page:

I know in the example above, I can encode this way:
A 0
B 10
C 11
and then this will need $1/3*1+1/3*2+1/3*2=1.6667$ bits per symbol. But how can I code to achieve the 1.58 goal in theory? Is that possible in practice?
You can't do that by using binary codewords for single symbols A, B and C.
As mentioned in the comments, for example, arithmetic coding can be used to get near the limit when the input sequence is long.
Another simple example: You can consider "super-symbols" that consist of 100 As, Bs and Cs and encode one super-symbol at a time, so that each of the $3^{100}$ possible super-symbols corresponds to a $159$-bit binary codeword ($3^{100}\leq 2^{159}$), giving you effectively $1.59$ bits per original symbol. Or sequences of $10000$ symbols, so that each sequence corresponds to a $15850$ bit codeword ($3^{10000}\leq 2^{15850}$), giving effectively $1.585$ bits per symbol.