Maximise entropy of random sample of random distribution

76 Views Asked by Bumbble Comm At 27 Mar 2026 - 2:51

Let $X_1, ..., X_n$ be positive discrete r.v.s such that $p^j_k = \mathbb{P}(X_j=k)$. Let $N \in \{ 1, ..., n\}$ be a discrete r.v. with pmf $a_j=\mathbb{P}(N=j)$. Then $X_N$ is the random variable chosen by sampling a random $p^j$ distribution according to $a_j$.

Calculate $H(N, X_N)$ and, for $p^j_k$ fixed, find its maximum value and the value of $a_1, ..., a_n$ which achieves it.

I believe $$ H(N,X_N)=-\sum_{j=1}^n\sum_{k=1}^\infty a_jp^j_k\log(a_jp^j_k)$$

I would guess that this is maximised by $$a_j=\frac{H(X_j)}{\sum_{j=1}^nH(X_j)}$$ simply because it seems like giving a higher probability to the r.v.s with more entropy would maximise the entropy. How can I prove (or disprove) this, preferably using Lagrange multipliers?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 11 May 2020 - 2:42 BEST ANSWER

Your problem can be stated as:

$$\max_{a\in K} -\sum_{j=1}^n \sum_{k=1}^{\infty} a_j p_{jk}\log(a_j p_{jk})$$

where $K=\{a_1,\ldots,a_n\ge 0\,\mid\,a_1+\ldots+a_n=1\}$. This is a very favorable case of convex programming (objective is concave, constraints are affine) so for example in that case KKT conditions are necessary and sufficient conditions for optimality. So if we consider:

$$\begin{align} f(a)&=-\sum_{j=1}^n \sum_{k=1}^{\infty} a_j p_{jk}\log(a_j p_{jk})\\ g_i(a)&=-a_i\\ h(a)&=a_1+\ldots+a_n-1\end{align}$$

Notice that:

$$\begin{align}\nabla f(a)&=(-1-\log a_j+H(X_j))^T_{1\le j\le n}\\ \nabla g_i(a)&=-e_i\\ \nabla h(a)&=(1,\ldots,1)^T \end{align}$$

Then at optimal $a^*$:

Stationary condition: $-1-\log a^*_j+H(X_j)-\mu_j-\lambda=0$ for all $j$
Primal feasability: $a^*_j\ge 0$ for all $j$ and $a^*_1+\ldots +a^*_n=1$
Dual feasability: $\mu_j\ge 0$ for all $j$
Complementary slackness: $\mu_j a^*_j=0$ for all $j$

The only way all these conditions are simultaneous satisfied is when $a^*_j=Ce^{H(X_j)}$ (for some constant $C$), so in the end:

$$a^*_j=\frac{e^{H(X_j)}}{\sum_{j=1}^n e^{H(X_j)}}$$

is the optimal solution. You can get to the same result using Lagrange multipliers but you probably need to handle manually what happens at the boundary.

Maximise entropy of random sample of random distribution

There are 1 best solutions below

Related Questions in LAGRANGE-MULTIPLIER

Related Questions in INFORMATION-THEORY

Related Questions in ENTROPY

Trending Questions

Popular # Hahtags

Popular Questions