Consider a random variable with probabilities taking on the values $1,2,\ldots,8$ with probabilities given by $$\left(\frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \frac{1}{16}, \frac{1}{64}, \frac{1}{64}, \frac{1}{64}, \frac{1}{64}\right).$$ The entropy of $X$ is 2 bits.
My questions are on the following ways to send a message about the outcome. My concerns/questions are in $\color{red}{\text{red}}$.
$``$Suppose that we wish to send a message indicating the outcome of an experiment. One alternative is to send the index of outcome. This description requires 3 bits for any of the outcomes.$"$
$\color{red}{\text{I would think it is 4 bits since I have } 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000}$.
$``$But the win probabilities are not uniform. It therefore makes sense to use shorter descriptions for the more probable indices and longer descriptions for the less probable ones, so that we achieve a lower average description length. For example, we could use the following set of bit strings to represent the eight indices: $$0,10, 110, 1110, 111100, 111101, 111110, 111111."$$ $\color{red}{\text{What is the correspondence between these strings and the 8 outcomes? What is the significance of the 1's}}$
The average description length in this case is 2 bits, as opposed to 3 bits for the uniform code.$"$ $\color{red}{\text{What is a description length?}}$ $``$Notice that the average description length in this case is equal to the entropy.$"$
As mentionned in the comments, you can start with code $000$ for outcome $1$, thus ending at $111$ for outcome $8$;
The idea is to assign a shorter code to outcomes with higher probabilities so that on average (over multiple messages), describing an outcome will require less bits. Looking at the frequencies, we know that the most probable outcome is $1$. We thus associate the shortest code ($0$: one bit) to it. The second most probable is $2$, and we give it the second shortest code ($10$: two bits). We proceed in the same way for the other ones. Notice that here we cannot know what the four codes of length $6$ bits are associated to, but we know that they correspond to the four outcomes with probability $1/64$ (that is, outcomes $5$, $6$, $7$ and $8$).
To compute the average description length, we observe that the outcome $1$ has probability $1/2$ and it is represented by a code of length $1$ bit. Hence, when sending a message describing the outcome of the random variable, it will be $1$ bit long exactly $1/2$ of the time. The outcome $2$ has probability $1/4$ and it is represented by a code of length $2$ bits, so that $1/4$ of the time, the description will be of length $2$ bits, etc... As a result the average length of the description is: $$ 1 \cdot \frac{1}{2} + 2 \cdot \frac{1}{4} + 3 \cdot \frac{1}{8} + 4 \cdot \frac{1}{16} + 6 \cdot \frac{1}{64} + 6 \cdot \frac{1}{64} + 6 \cdot \frac{1}{64} + 6 \cdot \frac{1}{64} = 2$$ In general, if you have codes $c_1, \dotsc, c_n$, associated with frequencies $f_1, \dotsc, f_n$, the average length of the description will be $$ \sum_{i = 1}^n f_i \cdot \text{length}(c_i),$$ where $\text{length}(c_i)$ denotes the length of the $i$-th code $c_i$.
If you want to learn more about this stuff (code systems) you can take a look at Shannon-Fano coding and Huffman coding.