Shannon Entropy Minimization

945 Views Asked by At

The Shannon Entropy for an observation is given by $ -x \log_2(x)$. Why is the maximum entropy achieved at $x = \frac{1}{e}$, and not at $x = 0$? Could someone provide a logical explanation that justifies the mathematics?

2

There are 2 best solutions below

0
On BEST ANSWER

Shannon Entropy, given a discrete probability distribution with probabilities $p_1,p_2,\dots,p_n$ is $$-\sum_i p_i\log_2(p_i).$$

This is not equal to the quantity you gave. If $n=1$ then we must have $p_1=1$ and then the Shannon Entropy is 0.

A relevant example is to figure out what $p_1$ maximizes Shannon entropy when $n=2$. This can be thought of as what coin (not necessarily fair) gives you the most information when you flip it.

0
On

As Martin Leslie mentions, Shannon Entropy, given a discrete probability distribution with probabilities $p_1,p_2,\dots,p_n$ is $$H = -\sum_i p_i\log_2(p_i).$$

The term $-\log_2(p_i)$ is the length of the optimal encoding of variable $i$. So a string of $n$ variables will have an encoded length of $nH$, and $-sp_i\log_2(p_i)$ is the number of bits used in encoding variable $i$. The fact that $-x\log(x)$ is maximized at $1/e$ means that more bits will be used encoding variables of probability $1/e$ than any other probability. That's fairly interesting.

As to why the mininum is not at 0: $-x\log x$ is, again, proportional to the total number of bits used toward encoding a variable of probability $x$. While the encoding length of a variable of probability very close to 0 is very large, the number of times it will be used is very small, so it's at least not CLEAR that 0 should be the winner. And indeed, when you do the math it's not.