What is the role of the logarithm in Shannon's entropy?

Question

What is the role of the logarithm in Shannon's entropy?

1.1k Views Asked by Bumbble Comm At 01 Apr 2026 - 11:56

I am a layman interested in understanding why the foundation of Shannon's entropy is logarithmic.

To that end I've read the answers here, at the Cross Validated Stack, but I'm not technical enough to infer the basic idea from the math.

https://stats.stackexchange.com/questions/87182/what-is-the-role-of-the-logarithm-in-shannons-entropy

But in trying to make an semi-educated guess I infer the following and am asking here if this is a reasonable explanation.

Shannon's entropy is logarithmic because the CHANCES of multiple information events occurring simultaneously are multiplied ... but should all those events occur the total VALUE of those events are summed.

For example, betting on coin flips. The chance of heads is $1/2$ while the chances for three heads in a row is $1/8 = (1/2)^3$ as each flip is independent.

Assuming the bet for each flip is also independent, the total won for winning each of the three flips is $B_1+B_2+B_3$.

This is a logarithmic relationship because we can express:

(a) An exponent as a log - the chance being 'to the power of'
(b) The result of said event as a sum. In other words, logarithms let you express multiplications as sums and vice versa.

Thanks.

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2022-04-27 12:58:09

There are many possible characterizations of entropy, including Shannon, Renyi and other entropies. For example John Baez discusses some of this in terms of information loss

Regarding Shannon entropy specifically, it was first axiomatically characterized by Khinchin, see Halizi's notes for example. Basically assuming a finite range of cardinality $k$ for the random variable $X$, say $X\in \{1,2,\ldots,k\},$ if you want that

$H(X)$ should depend only on the probability distribution of $X$ not on the values it takes (labels are unimportant to uncertainty)
For a given $k,$ $H(X)$ should be maximum when $X$ is uniform with probability $1/k$ everywhere
Adding points to the range with zero probability should not change the entropy
There should be a chain rule $H(X,Y)=H(X)+H(Y|X)$ connecting the joint entropy of two random variables via a conditional entropy $H(X|Y)$ (uncertainty in both is uncertainty in the first plus uncertainty in the second given the first)

then the only function satisfying these axioms is the entropy denoted as Shannon entropy $$ H(X)=\sum_{i=1}^k P(X=i) \log (1/P(X=i)) $$ up to a choice of the base of logarithm.

Prior to Shannon and Khinchin, Hartley had defined entropy only with respect to uniform distributions, actually just by looking at the number of points and not worrying about the probability distribution, so just as $\log k$ which is of course the maximum entropy for the general case.

**Bumbble Comm** · Answer 2 · 2022-04-27 15:53:20

Short version: What you said:

the chances of multiple information events occurring simultaneously are multiplied ... but should all those events occur the total value of those events are summed

is pretty much correct.

Suppose you are going to send a message $M_1$ and then a message $M_2$. Let's try to invent a measure of the information content of the two messages and write $I(M_1)$ and $I(M_2)$ for the information content of the two messages. What properties would we like this “information content” to have?

Suppose we consider the act of sending $M_1$ and then $M_2$ as a single communication, which we could call $M_1\oplus M_2$. Whatever $I$ means, we ought to have $$I(M_1\oplus M_2) = I(M_1) + I(M_2)$$

which says that the amount of information in the combined message is equal to the sum of the information in the two messages sent separately.

Now suppose $M_1$ could consist of any of $n_1$ distinguishable signals and similarly $M_2$ will be one of $n_2$ distinguishable signals. It makes sense that the information content is related to the number of possible signals: if I could send you one of a million possible messages, the message I send will carry more information than if there are only two possible messages I could have sent. Let's see what happens if we try to express $I$ only in terms of $n_1$ and $n_2$.

Since there are $n_1$ was to send $M_1$, and $n_2$ ways to send $M_2$, there are just $n_1\cdot n_2\text{ }$ways to send $M_1\oplus M_2$. So if $I$ can be expressed in terms of the number of possible messages that could be sent, we must have $$I(n_1\cdot n_2) = I(n_1)+I(n_2)$$ which tells us that $I$ must be logarithmic in the number of possible messages.

The choice of which logarithm we use isn't important; it is simply a change of units, like the choice of whether to measure distance in feet or meters. One common choice is a base-2 logarithm, in which case we get a unit called a bit; sometimes we use a base-$e$ logarithm and get a unit called a nat which is 44% larger. In the past a base-10 logarithm was sometimes used, producing a unit called a hartley which is around 3.32 bits.

What is the role of the logarithm in Shannon's entropy?

There are 2 best solutions below

Related Questions in INFORMATION-THEORY

Related Questions in ENTROPY

Trending Questions

Popular # Hahtags

Popular Questions