In my algorithms course I have been introduced to the concept of entropy and data compression, mainly using Huffman encoding. I am trying to understand the formula for entropy
$$\sum_i p_i \log\left(\frac{1}{p_i}\right) = -\sum_i p_i\log(p_i)$$
I have been given this question which is provided as a justification of the formula but I am not quite sure how to solve it.
Consider a distribution over n possible outcomes, with probabilities $p_1,p_2,\dots,p_n$. Assume for simplicity that each $p_i$ is of the form $\frac{1}{2^k}$ with $k \in \mathbb{N}$. Suppose that a sequence of $m$ samples is drawn from the distribution and that for all $1<i \leq n$, the $i^{th}$ outcome occurs exactly $mp_i$ times. Show that if Huffman encoding is applied to this sequence, the resulting encoding will have length
$$\sum_i mp_i \log\left(\frac{1}{p_i}\right)$$
What I understand so far is that the number of bits that will be used will be $$\sum_i mp_i \cdot (number\ of\ bits\ in \ representation \ of\ n_i)$$ so I assume that $$(number\ of\ bits\ in \ representation \ of\ n_i) = \log\left(\frac{1}{p_i}\right)$$
Then letting $m = 1$ we get $$\sum_i p_i \log\left(\frac{1}{p_i}\right)$$ being the average number of bits to represent an outcome of the distribution which tells us about how much information this distribution provides
but I am not sure how to show that
$$(number\ of\ bits\ in \ representation \ of\ n_i) = \log\left(\frac{1}{p_i}\right)$$
Any help would be appreciated.
If the probabilities are $p_i = 2^{-k}$ ("diadic distribution"), then it's easy (assuming you understand how Hufmman codes are constructed) to show that the binary Huffman code will give a perfectly balanced binary tree, of depth $k$. Hence, all codewords will have length $k=\log (1/p_i)$