I am a layman interested in understanding why the foundation of Shannon's entropy is logarithmic.
To that end I've read the answers here, at the Cross Validated Stack, but I'm not technical enough to infer the basic idea from the math.
But in trying to make an semi-educated guess I infer the following and am asking here if this is a reasonable explanation.
Shannon's entropy is logarithmic because the CHANCES of multiple information events occurring simultaneously are multiplied ... but should all those events occur the total VALUE of those events are summed.
For example, betting on coin flips. The chance of heads is $1/2$ while the chances for three heads in a row is $1/8 = (1/2)^3$ as each flip is independent.
Assuming the bet for each flip is also independent, the total won for winning each of the three flips is $B_1+B_2+B_3$.
This is a logarithmic relationship because we can express:
- (a) An exponent as a log - the chance being 'to the power of'
- (b) The result of said event as a sum. In other words, logarithms let you express multiplications as sums and vice versa.
Thanks.
There are many possible characterizations of entropy, including Shannon, Renyi and other entropies. For example John Baez discusses some of this in terms of information loss
Regarding Shannon entropy specifically, it was first axiomatically characterized by Khinchin, see Halizi's notes for example. Basically assuming a finite range of cardinality $k$ for the random variable $X$, say $X\in \{1,2,\ldots,k\},$ if you want that
then the only function satisfying these axioms is the entropy denoted as Shannon entropy $$ H(X)=\sum_{i=1}^k P(X=i) \log (1/P(X=i)) $$ up to a choice of the base of logarithm.
Prior to Shannon and Khinchin, Hartley had defined entropy only with respect to uniform distributions, actually just by looking at the number of points and not worrying about the probability distribution, so just as $\log k$ which is of course the maximum entropy for the general case.