Entropy vs differential entropy (a good extension, even if it can be negative)?

Question

Entropy vs differential entropy (a good extension, even if it can be negative)?

708 Views Asked by Bumbble Comm At 02 Apr 2026 - 6:46

The entropy of a discrete random variable, taking value $i$ with probability $p_i$, is defined as

$$ H(X)=-\sum_{i}p_i\log (p_i),$$

and can be seen as the amount of information gained once we are told the value of $X$. It is a measure of disorder of the system. Clearly $H(X)$ will always be non-negative. When the definition is extended to continuous random variable $Y$, it is called differential entropy, and defined as (when $Y$ has density $f$)

$$H(Y)=-\int f(y) \log(f(y)) dy.$$

In this case $H(Y)$ can be negative, so how is it a good extension of the definition, since it can no longer be seen as a measure of "information gained once we are told the value of $Y$".

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-09-27 11:03:19

There is no obvious interpretation of differential entropy which would be as meaningful or useful as that of entropy. The problem with continuous random variables is that their values at a point typically have 0 probability, and therefore would require an infinite number of bits to encode.

Once you realize this, it is still a useful concept. For example you can define an $\epsilon-$entropy, the expected cost in bits of encoding the random variable $X$ so that the encoded value $X'$ is within $\epsilon$ of $X$. This is no more than the idea of lossy source coding for discrete distributions as well.

If you look at the limit of discrete entropy by measuring the probability of intervals $[n\varepsilon, (n + 1)\varepsilon[$, you end up with

$$-\int p(x) \log_2 p(x) \, dx - \log_2 \varepsilon$$

and not the differential entropy. This quantity is in a sense more meaningful, but will diverge to infinity as we take smaller and smaller intervals. It makes sense, since we'll need more and more bits to encode in which of the many intervals the value of our random value falls.

A more useful quantity to look at for continuous distributions is the relative entropy (also Kullback-Leibler divergence). For discrete distributions:

$$D_\text{KL}[P || Q] = \sum_x P(x) \log_2 \frac{P(x)}{Q(x)}.$$

It measures the number of extra bits used when the true distribution is $P$, but we use $-\log Q_2(x)$ bits to encode $x$. We can take the limit of relative entropy and arrive at

$$D_\text{KL}[p \mid\mid q] = \int p(x) \log_2 \frac{p(x)}{q(x)} \, dx,$$

because $\log_2 \varepsilon$ will cancel. For continuous distributions this corresponds to the number of extra bits used in the limit of infinitesimally small bins. For both continuous and discrete distributions, this is always non-negative.

Now, we could think of differential entropy as the negative relative entropy between $p(x)$ and an unnormalized density $\lambda(x) = 1$,

$$-\int p(x) \log_2 p(x) \, dx = -D_\text{KL}[p \mid\mid \lambda].$$

Its interpretation would be the difference in the number of bits required by using $-\log_2 \int_{n\varepsilon}^{(n + 1)\varepsilon} p(x) \, dx$ bits to encode the $n$-th interval instead of $-\log \varepsilon$ bits. Even though the former would be optimal, this difference can now be negative, because $\lambda$ is cheating (by not integrating to 1) and therefore might assign fewer bits on average than theoretically possible.

There is more at Sergio Verdu's talk for a good introduction to relative entropy.

Entropy vs differential entropy (a good extension, even if it can be negative)?

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in SOFT-QUESTION

Related Questions in INFORMATION-THEORY

Related Questions in ENTROPY

Trending Questions

Popular # Hahtags

Popular Questions