How is the entropy of uniformly distributed probability mass functions distributed?

74 Views Asked by At

This question is quite similar to this one but not exactly the same.

Suppose we have a random vector $P \in R^n$ with $\|P\|_1 = 1$, which represents a probability mass function over a finite set of outcomes $X = \{X_1, ..., X_n\}$. This vector is distributed according to a flat Dirichlet distribution with parameters $\alpha_i = 1$: $$P \sim Dirichlet(n, \alpha = (1, ..., 1))$$ or equivalently $$P_i \sim Beta(1, n - 1)$$ which is a uniform distribution of points over the $(n-1)$-simplex.

Given one of these $p$'s, we can calculate its entropy $$H(p) = E_p[-\log_n(p)] = \Sigma_{i=1}^n -p_i\log_n(p_i) \in [0, 1].$$ My question is, what would be the associated distribution of the value of $H(P)$?

I am not sure whether it is possible to go step-by-step in order to how to find the distribution of $-P_i \log_n(P_i)$ and then the sum over all $i$ (as they are correlated). I've noticed that from the distribution of $P_i$ we know that $-\log(1 - P_i)$ is distributed as $Exponential(n - 1)$, that might be useful. The other way around has me trying to get something out of $H(P) = E_P[-\log_n(P)]$, where the expected value itself is a random variable, but whether this makes sense at all is not very clear to me.

1

There are 1 best solutions below

0
On

Not an answer / too long for a comment

I am not an expert on Dirichlet distribution, so please let me start with a question: If we condition on the last $p_n = 1- \lambda$, and re-scale the rest via

$$q_i = {p_i \over \lambda} \,\,\,\,\,\,\forall i \in {1,2,\dots, n-1}$$

so that $\sum_{i=1}^{n-1} q_i = 1$, would the $q_i$ be distributed according to $Dirichlet(n-1, (1,1,\dots,1))$, and be independent of $p_n$? Intuitively this seems it should be true, based on the description of Dirichlet as uniform on vectors with $||p||_1 = 1$, but hopefully someone more knowledgeable can confirm this.

Anyway, if the above is true, then maybe induction on $n$ would be a viable approach? This is inspired by the OP's "step by step" comment, and working out some details.

$$ \begin{array}{} H_{n-1}(q) &= \sum_{i=1}^{n-1} -q_i \log q_i \\ &= \sum_{i=1}^{n-1} - {p_i \over \lambda} \log {p_i \over \lambda} \\ &= {1 \over \lambda} \sum_{i=1}^{n-1} -p_i (\log p_i - \log \lambda)\\ &= {1 \over \lambda} (\sum_{i=1}^{n-1} (-p_i \log p_i) + \log \lambda \sum_{i=1}^{n-1} p_i)\\ &= {1 \over \lambda} (H_n(p) + p_n \log p_n + \log \lambda (1-p_n))\\ &= {1 \over 1 - p_n} (H_n(p) + p_n \log p_n + 2 \log (1-p_n)) \\ H_n(p) &= (1 - p_n) H_{n-1}(q) - p_n \log p_n - 2 \log (1-p_n) \end{array} $$

So assuming independence, we know the distribution of $p_n$, and by induction the distribution of $H_{n-1}(q)$, the last equation shows how to combine these two random variables into a new r.v. $H_n(p)$. And as the OP suspected, it does not seem to be as simple as "summing all $-p_i \log p_i$". The combination above seems very complicated though, so I am not sure there is a good way to proceed (unless you are just interested in simple stuff like $E[H(p)]$).

Anyway, hope this is somewhat helpful.