How does changing the log base affect the intepretation of information entropy

365 Views Asked by At

Entropy of random variable is defined as:

$$H(X)= \sum_{i=1}^n p_i \log_2(p_i)$$

Which as far as I understand can be interpreted as how many yes/no questions one would have to ask on average, to find out the value of the random variable $X$.

But what if the log base is changed to for example e? What would the interpretation be then? Is there an intuitive explanation?

2

There are 2 best solutions below

2
On BEST ANSWER

Obviously this interpretation breaks down (at least somewhat) for non-integer bases, but for any logarithm base $b$, not just base $b=2$, we can interpret the information with respect to that base, $$H_b(X) = \sum_{i=1}^n p_i \log_b(p_i)$$ to be the average number of $b-$ary questions one would have to ask on average in order to find out the value of the random variable $X$.

What do I mean by a $b-$ary question? One that has at most $b$ possible answers. Any yes/no question is a $2-$ary question, and if one can reword it cleverly enough, any $2-$ary question can be stated as a yes/no question (since yes and no are both exhaustive and mutually exclusive).

More precisely, a $b$-ary question is one that has $b$ possible answers, which together exhaust all possibilities and each of which is mutually exclusive. In general, for $b \not=2$, it might be difficult to think of truly $b-$ary questions which don't involve artificially restricting the possible answers.

Consider, for example, trying to determine the value of a dice roll. If you are limited to $2-$ary questions, (e.g. "did it roll a $1$?") then on average it will take $\log_2 6$ questions to ascertain what value it rolled. However, if you are allowed to ask the $6-$ary question "did it roll a 1, 2, 3, 4, 5, or 6"? Then it will only take you $\log_6 6 =1$ question on average to ascertain the value.

If you consider the question tree consisting of all possible questions to derive the answer, then $b$ is just the number of branches emanating from each node (since each node is a question). If you increase $b$, you will decrease the number of questions necessary on average to ascertain the answer because each node will have more branches emanating from it, and thus you can traverse a larger portion of the set of possible answers by hitting fewer nodes.

This is in fact very similar to the idea of recursion equations and recursion trees in computer science; in particular, I encourage you to look at Section 4.4 of Cormen et al, Introduction to Algorithms, for an explanation of how the logarithm enters naturally into these types of situations.

We can think of asking a $b-$ary question as dividing the problem of finding the value of the random variable $X$ into $b$ subproblems of equal size -- then the analogy with recursion should become more clear (hopefully).

0
On

One has $$\log_a(x)=\frac{\log b}{\log a}\:\log_b(x)$$ thus $$H(X)= \sum_{i=1}^n p_i \log_a(p_i)=\frac{\log b}{\log a}\sum_{i=1}^n p_i \log_b(p_i)$$ or $$ H_a(X)=\frac{\log b}{\log a}H_b(X). $$