Trying to understand Shannon entropy

629 Views Asked by At

In my algorithms course I have been introduced to the concept of entropy and data compression, mainly using Huffman encoding. I am trying to understand the formula for entropy

$$\sum_i p_i \log\left(\frac{1}{p_i}\right) = -\sum_i p_i\log(p_i)$$

I have been given this question which is provided as a justification of the formula but I am not quite sure how to solve it.

Consider a distribution over n possible outcomes, with probabilities $p_1,p_2,\dots,p_n$. Assume for simplicity that each $p_i$ is of the form $\frac{1}{2^k}$ with $k \in \mathbb{N}$. Suppose that a sequence of $m$ samples is drawn from the distribution and that for all $1<i \leq n$, the $i^{th}$ outcome occurs exactly $mp_i$ times. Show that if Huffman encoding is applied to this sequence, the resulting encoding will have length

$$\sum_i mp_i \log\left(\frac{1}{p_i}\right)$$

What I understand so far is that the number of bits that will be used will be $$\sum_i mp_i \cdot (number\ of\ bits\ in \ representation \ of\ n_i)$$ so I assume that $$(number\ of\ bits\ in \ representation \ of\ n_i) = \log\left(\frac{1}{p_i}\right)$$

Then letting $m = 1$ we get $$\sum_i p_i \log\left(\frac{1}{p_i}\right)$$ being the average number of bits to represent an outcome of the distribution which tells us about how much information this distribution provides

but I am not sure how to show that

$$(number\ of\ bits\ in \ representation \ of\ n_i) = \log\left(\frac{1}{p_i}\right)$$

Any help would be appreciated.

2

There are 2 best solutions below

0
On

If the probabilities are $p_i = 2^{-k}$ ("diadic distribution"), then it's easy (assuming you understand how Hufmman codes are constructed) to show that the binary Huffman code will give a perfectly balanced binary tree, of depth $k$. Hence, all codewords will have length $k=\log (1/p_i)$

0
On

Ok, this is not a definite answer, but just some pointers to try and help build some intuition.

Entropy is an expected value, specifically this one: $-E[log_n(p)]$. Then recall that logarithm measures the number of digits of the representation of a number. Log 10 of a base 10 number is a measure of how far from the decimal point it is $10^3 = 1000$ which requires 3 decimal digits "000" to get down to 1. Equivalently, log 2 measures how many binary digits (bits) required to "reach up to" (or down to) 1.