What is the reasoning behind the definition of Shannon Entropy?

59 Views Asked by At

Shannon entropy is defined as the average content of information. Where information content is defined as $Q=-k\log(P_i)$, $k>0$,

$$ S = \langle Q \rangle = \sum_{i=1} Q_i P_i = -k\sum_{i=1} P_i\log(P_i) $$

That being said, my question is why $\langle Q\rangle = \sum_{i=1} Q_i P_i $? More specifically, why is the information content multiplied by the probability of that information content itself? If there are any resources that you also suggestion, I would appreciate it. Thank you!

1

There are 1 best solutions below

3
On

The angle brackets around $Q$ mean that we are taking the expected value of the $Q$ distribution. For a random variable $X$ with probability distribution $\{P_i\equiv P(X_i)\}$, the definition of expected value (https://en.wikipedia.org/wiki/Expected_value) is:

$$\langle X\rangle=\sum_i X_iP(X_i)=\sum_iX_iP_i$$