What is the condition for a guaranteed reduction in entropy when shrinking the sample space

32 Views Asked by At

As an example, if we have a fair die with 6 faces with each face corresponds to number 1-6, its entropy is $H(X) = -\sum_{i=1}^{6} \frac{1}{6} \log_2\left(\frac{1}{6}\right) \approx 2.58 $. If we have a fair die with only 5 faces with each face corresponds to number 1-5, its entropy is $H(X') = -\sum_{i=1}^{5} \frac{1}{5} \log_2\left(\frac{1}{5}\right) \approx 2.32 $. In this case, reducing the size of sample space resulted in reduction in random variable's entropy. However, this is not true in general (think of an extremely unfair die with $P(1) = 0.995$, $P(2-6) = 0.001$, and then remove face 1). I wonder if there is some existing result in information theory that gives a condition on when this reduction in entropy will hold for discrete random variables. My intuition is, we need to place a constraint on the outcome with the highest probability. In particular, this probability shouldn't be larger than the outcome with the lowest probability by some factor that scales with the size of the sample space.

1

There are 1 best solutions below

0
On

Suppose WLOG that $X$ takes $n$ values in $1,2,\cdots n$ and let $Y$ be the resulting variable after removing value $n$. Let $p_n = P(X_n)=P(X=n)$.

Then $X$ can be considered as a non-overlapping mixture of two distributions , one with $n-1$ values (rv $Y$) and one with a single value. Then (see eg):

$$H(X) = h(p_n) + p_n \times 0 + (1-p_n) H(Y)=h(p_n) + (1-p_n) H(Y) \tag1$$

where $h()$ is the binary entropy function, and hence

$$H(X) > H(Y) \iff H(X) < \frac{h(p_n)}{p_n} =- \log(p_n) - \frac{1-p_n}{p_n}\, \log(1-p_n) \tag2$$

In particular, if $X$ is uniform, $p_n = 1/n$ and the inequality takes the form

$$ \log n < \log n - (n-1)\log(1-1/n) $$

which of course is true.

I'm not sure what more can be said of $(2)$ in general.