Can we approximate the most likely event's probability if we know the distribution's entropy?

190 Views Asked by At

Suppose that we know that a distribution $p$ over small $n$ elements has entropy $H(p)=0.01$. I intentionally chose a small number. Is there some kind of inequality or aproximation to the probability $p_i$ for $i = \arg\max_j p_j$?

What I came up with so far is the following inequality: $H(p) = \sum _j -p_j \log(p_j) \geq -p_i \log(p_i)$. This gives us in principle an inequality for $p_i$, because if we assume that the most likely event has probability larger than $0.5$ (a consequence of the entropy being sufficiently small for a given $n$), then this inequality $H(p) \geq -p_i \log(p_i)$ should give us a bound on $p_i$ in terms of the entropy. I'm not sure how to solve that inequality for $p_i$ though, or if it's even possible. Maybe an approximation would help as well, that works for small entropy.

Is there a bound or approximation on the maximum probability $p_i$ in terms of the entropy, that works for sufficiently small entropy?

2

There are 2 best solutions below

4
On BEST ANSWER

I don't think one can analytically solve this in terms of elementary functions, but using the monotonicity of $z \mapsto -z \ln z$ for $z > 1/e,$ one can numerically invert your relation to get a lower bound on $p_{i}.$ For small $H$, this should be quite close to $1.$

A slightly better approach is to observe that $$ H(X) = \sum p_j \log \frac{1}{p_j} \ge \sum p_j \log \frac{1}{p_i} = \log \frac{1}{p_i},$$ which immediately tells us that $p_i \ge \exp(-H)$. Note that this improves upon inverting $-p\log p = H$ since $-\log p \ge -p\log p$.

We can in fact do better - let $Y = \mathbf{1}\{X = i\}$. Then the data processing inequality tells us that $H(Y) \le H(X)$. But since $Y$ is a binary function, we can (again numerically, but not analytically) invert the entropy to tell us that $p_{i}$ is either smaller than $h_2^{-1}(H(X))$ or bigger than $1- h_2^{-1}(H(X)),$ where $h_2^{-1}$ is the binary entropy inversion function. If we know that $p_{i} > 1/2,$ then this directly gives a lower bound. For small $H$ this should be the tightest possible lower bound, because the random variable could well be supported on just two letters (or have arbitrarily small mass on all but two), in which case $p_i$ has to step back from $1$ enough to at least give enough binary entropy.


Of course, if we're going numerical, then we might as well set up the relevant optimisation program in terms of $H$. Happily, this yields an upper bound upon considering the following convex program \begin{align} p_{\max}(H,n) := &\max p_1 \\ &\textrm{s.t. } \sum_{i = 1}^n p_i \log p_i = - H, \\ &\phantom{\textrm{s.t. }} \sum_{i = 1}^n p_i = 1,\\ &\phantom{\textrm{s.t. }} \forall i, p_i \ge 0. \end{align}

This has the Lagrangian $$ \mathcal{L} = p_1 + \lambda (\sum p_i \log p_i + H) + \mu(\sum p_i - 1),$$ which yields the first order conditions $$ 1 + \lambda + \lambda \log p_1 + \mu = 0 \\ \forall i \ge 2, \lambda + \lambda \log p_i + \mu = 0,$$ which in turn tells us that the solutions must be of the form $$ (p_1, \pi, \pi, \dots, \pi), \pi = \frac{1-p_1}{n-1},$$ which should make sense - if we want $p_1$ to be big but need to meet an entropy constraint, then the best thing to do is to spread out the remaining mass uniformly on the remainder of the support, to make the relevant conditional entropy as large as possible. In any case, using the above, we may conclude that $p_\max$ must be the largest solution in $[0,1]$ to the equation $$ - p \log p - (1-p) \log \frac{1-p}{n-1} = H \iff h_2(p) + (1-p) \log(n-1) = H.$$

Of course, by definition, we get the upper bound $ p_i \le p_{\max}(H,n).$ I don't quite know how convenient this is, but again, $p_{\max}$ is reasonable to solve for over $[1/2, 1]$ via something like Newton's method or binary search.

As mentioned above, in principle we can also set up a minimisation program (which would minimise $p_1$, but under the extra constraints that $p_1 - p_i \ge 0$ for each $i$), but this should just give the binary entropy inversion solution from above.

8
On

Notice that $$H(X) = h(p_j) + (1-p_j) H(X \backslash x_j) \tag 1$$

where $h$ is binary entropy function and $H(X \backslash x_j)$ is the entropy of the remaining values (or equivalenty, $H(X \mid X \ne x_j)$ (proof).

This gives a lower and upper bound of the entropy in terms of $p_j$:

$$ h(p_j) \le H(X) \le h(p_j) + (1 - p_j) \log(n-1) \tag 2 $$

where $n=|{\cal X}|$ is the cardinality of the variable $X$ support. (Of course, the logarithm should be in base $2$ if the entropy is expressed in bits).

enter image description here

The graph displays this lower bound (eq $(2)$, in green) and the upper bound for $n=3$ (blue) and $n=9$ (red), for $p_j \to 1$. The entropy, in the vertical axis, in bits.

The permisible pairs $(H, p_j)$ lie between the green and red/blue lines. Hence $(2)$ give lower and upper bounds for $p_j$, which should be computed numerically. See also here.

A semiempirical approximation for the lower bound, for the range $0.01 \le H < 0.4$ : $$ p_j \approx 1- \frac{a H}{b - \log(H)}$$

where $a=0.73$, $b = 2.3$.