What does entropy capture that variance does not?

5.7k Views Asked by At

Consider a discrete distribution like this one: $[0.6,0.15,0.1,0.08,0.05,0.02]$

Its entropy is $-\sum p_i\log p_i = 1.805$ , and its variance is $\frac{\sum_i(p_i - \bar{p})^2}{n} = 0.039188$

They both measure the spread of this distribution. For distributions like this that are far from uniform, what information does one capture that the other does not?

3

There are 3 best solutions below

6
On

Variance is sensitive to the scale of the distribution while entropy is not. If $X$ is a random variable with finite support, then $X$ and $100X$ have the same entropy, but different variances.

1
On

You have to be more careful with what your outcomes are and what their probabilities are. From what I see you have 6 outcomes, let's call them $x_1,\dots,x_6$, with probabilities $p_1,\dots,p_6$ given in your list.

The outcomes can have cardinal values, e.g. throwing an (unfair) dice -> $x_1 = 1,\dots, x_6 = 6$. They can also be nominal, such as ethnicity -> $x_1 =$ black, $x_2 =$ caucasian etc.

In the first case, it makes sense to define mean and variance $$ \overline x = \sum_{i=1}^{6} p_ix_i, \qquad \mathbb V = \sum_{i=1}^{6} p_i (x_i-\overline x)^2. $$ The variance measures the (quadratic) spread around the mean. Note, that this definition is different from yours.

In the second case, mean and variance do not make any sense, since you cannot add black to caucasian or scale them, square them etc.

The entropy, on the other hand, can be defined in both cases! Intuitively, it measures the uncertainty of the outcome.

Note that, as Mike Hawk pointed out, it does not care what the outcomes actually are. They can be $x_1 = 1,\dots, x_6 = 6$ or $x_1 = 100,\dots, x_6 = 600$ or ($x_1 =$ black, $x_2 =$ caucasian etc.), the result will only depend on the probabilities $p_1,\dots,p_6$. The variance on the other hand will be very different for the first two cases (by the factor of 10000) and not exist in the third case.

Your definition of variance is very unconventional, it measures the spread of the actual probability values instead of the outcomes. I think that theoretically this can be made sense of, but I very much doubt that this is the quantity you wish to consider (especially as a medical doctor).

It is definitely not meaningful to compare it to entropy, which measures the uncertainty of the outcome. The entropy is maximal if all outcomes have equal probability $1/6$, whereas this would yield the minimal value 0 for your definition of variance...

Hope this helps.

0
On

They both measure the spread of this distribution.

I believe you are correct: they both measure the spread of the distribution, but in some cases, one is more useful than the other:

  • Variance measures how far away from the mean is the data: $\mathbb V_i = (p_i - \bar p_i)^2$
    • If the data doesn't vary a lot, there is a small spread about the mean: $p_i \approx \bar p \iff \mathbb V_i \approx 0$
  • Entropy measures how uncertain you are about the outcome: $\mathbb E_i = -p_i \log p_i$
    • If you are very confident about the outcome, the entropy is very low: $p_i \approx 0; p_i \approx 1 \iff \mathbb E_i \approx 0$)
    • Conversely, if you are not confident about the event, the entropy is high

Note that if there is very little spread in the probabilities, the variance is small. However, the entropy is large, because the probabilities are similar so we are now uncertain about the outcome.

For distributions like this that are far from uniform, what information does one capture that the other does not

  • Variance of the distribution: "how far away is it from a uniform distribution?"

    • This simply comes about by noting that $\bar p = \frac{1}{n} \sum_p p_i = \frac{1}{n}$. So we get $\mathbb V_i = (p_i - \frac{1}{n})^2 $.
    • It also helps to understand the bounds
      • Minimum: $\mathbb V = 0$ when $p$ is Uniform
      • Maximum: $\mathbb V = \frac{n - 1}{n^2}$ when $p = \{1,0,0...\}$
  • Entropy of the distribution: "how close we are to a uniform distribution?"

    • Again let's consider the bounds
      • Minimum: $\mathbb E = 0$ when $p = \{1, 0, 0, ...\}$
      • Maximum: $\mathbb E = \log n$ when $p$ is Uniform

As you mentioned, these measure similar things, but the properties are slightly different:

  • Consider $\lim n \to \inf \implies \mathbb V_{\text{max}} \to 0, \mathbb E_{\text{max}} \to \log N$. This means that variance becomes a redundant measure of uncertainty for large $n$
  • Entropy can also be interpreted as how many bits are required to represent the outcomes

Aside:

Derivation of $\mathbb V_{\text{max}} = \frac{n - 1}{n}$ when $p = \{1,0,0...\}$

$$ \begin{align} \mathbb V_{\text{max}} &= \frac{1}{n} \sum_{i=1}^n (p_i - \frac{1}{n})^2 \\ &= \frac{1}{n} \left ( (n - 1) (\frac{1}{n^2}) + (1) (1 - \frac{1}{n})^2 \right ) \\ &= \frac{n-1}{n^2} \end{align} $$

Note that I haven't actually proved why this is the maximum..