Entropy of union of multisets

383 Views Asked by At

Assigning a random variable to some multiset: Assume that $S$ is a multiset. We can think of $S$ as independent sampling from some random variable. For instance, $S = \{H, H, T, T, T\}$ can be thought of as 5 sampling of a coin. If the multiset is "large enough", we can estimate the distribution of the random variable (using a frequentist approach). For instance, given $S = \{H:500, T:1000\}$, we may estimate that this multiset is generated by 1500 independent sampling of a (biased) coin, with probability of heads being $\tfrac{1}{3}$, and probability of tails being $\tfrac{2}{3}$.

Consider two multisets $S_1$ and $S_2$. Assign random variables $X$ and $Y$ to $S_1$ and $S_2$, as described above.

Let $S_3$ be the union of $S_1$ and $S_2$. Assign random variables $Z$ to $S_3$.

How is the entropy of $Z$ related to the entropy of $X$ and $Y$?

Especially, can we prove that:

$$\max\{H(X), H(Y)\} \le H(Z) \le H(X) + H(Y)$$

Example:

$$S_1 = \{a:2000, b:3000, c:5000\}$$

$$S_2 = \{a:7000, d:4000\}$$

$$S_3 = \{a:9000, b:3000, c:5000, d:4000\}$$

$$H(X) = 1.5, \qquad H(Y) = 0.9, \qquad H(Z) = 1.9$$

1

There are 1 best solutions below

6
On

This is the entropy of a mixture of (two) distributions, the probability of sampling from distribution number $i$ being proportional to the the sum $W_i$ of the weights of the elements of multiset $i$ (for $i=1,2$).

The entropy is $\sum f(p) $ where $f(p)=-p\log p$ and $p$ runs through the probabilities of the different elements, so $\sum f(tp_1 + (1-t)p_2)$ where $t = W_1/(W_1+W_2)$. By convexity this is larger than $tH_1 + (1-t)H_2$ where $H_i$ is the entropy of distribution $i$. Which is larger than $\min (H_i )$.

There is no lower bound (such as $\max H_i$) that is independent of the weights in the mixture, other than the trivial $\min H_i$. No larger lower-bound is possible because the mixture can look very similar to any of its pure parts, such as the one with lowest entropy. That the minimum is a lower bound was proved above.

An upper bound is the number of bits to encode "which distribution" plus the maximum possible number of bits to encode an element from that distribution. Without knowledge of the weights in the mixture, this can be as large as $(\max H_i + \log(n))$, where $n$ is the number of distributions in the mixture. The $\log (n)$ term is the entropy of uniform distribution on $n$ elements. Naturally the $\log(n)$ can be improved if the weights of the mixture are known, by calculating the entropy of the distribution they represent.