How Entropy scales with sample size

15.6k Views Asked by At

For a discrete probability distribution, the entropy is defined as: $$H(p) = \sum_i p(x_i) \log(p(x_i))$$ I'm trying to use the entropy as a measure of how "flat / noisy" vs. "peaked" a distribution is, where smaller entropy corresponds to more "peakedness". I want to use a cutoff threshold to decide which distributions are "peaked" and which are "flat". The problem with this approach is that for "same shaped" distributions, the entropy is different for different sample sizes! as a simple example take the uniform distribution - it's entropy is: $$p_i = \frac{1}{n}\ \ \to \ \ H = \log n$$ To make things worse, there doesn't seem to be a general rule for more complex distributions.

So, the question is:

How should I normalize the entropy so that I get the same "scaled entropy" for "same" distributions irrespective of the sample size?

1

There are 1 best solutions below

4
On BEST ANSWER

A partial answer for further reference:

In short, use the integral formulation of the entropy and pretend that the discrete distribution is sampling a continuous one.

Thus, create a continuous distribution $p(x)$ whose integral is approximated by the Riemann sum of the $p_i$'s: $$\int_0^1 p(x)dx \sim \sum_i p_i\cdot \frac{1}{N} = 1$$ This means that the $p_i$'s must first be normalized so that $\sum_i p_i = N$.

After normalization, we calculate the entropy: $$H=-\int_0^1 p(x)\log\left(p(x)\right)dx \sim -\sum_i p_i \log(p_i)\cdot \frac{1}{N}$$

As $N\to \infty$ this gives an entropy which is solely related to the distribution shape and does not depend on $N$. For small $N$, the difference will depend on how good the Riemann sum approximates the integrals for given $N$.