Given a list of Bernoulli variables $X_1, X_2, \ldots, X_n$ with fixed marginal distributions (i.e., success probabilities) $p_i = \Pr[X_i = 1], \forall i \in [n]$.
Question: How can we minimize the joint entropy $H(X_1, X_2, \ldots, X_n)$ while preserving the marginal distributions?
$$H(X_1, ..., X_n) = -\sum_{x_1 \in\mathcal X_1} ... \sum_{x_n \in\mathcal X_n} P(x_1, ..., x_n) \log_2[P(x_1, ..., x_n)]$$
Intuition: We can first sample a number $s \sim \mathcal{U} (0, 1)$ and then decide each $X_i$ by $X_i = 1$ iff $s \leq p_i$. Such a way preserves each individual success probability and intuitively gives the highest dependency among $X_i$'s, thus giving the lowest joint entropy.
However, currently, I cannot come up with rigorous proof.
Note: It is known that the problem of minimizing joint entropy given marginal distributions is NP-hard in general, but I am wondering whether it is possible to have a solution for the special case we are considering.
Refs:
- Kocaoglu, Murat, et al. "Entropic causal inference." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. No. 1. 2017.
- Cicalese, Ferdinando, Luisa Gargano, and Ugo Vaccaro. "Minimum-entropy couplings and their applications." IEEE Transactions on Information Theory 65.6 (2019): 3436-3451.
Update: The way described above does not minimize joint entropy. Consider the case with $n= 2$, $p_1 = 0.3$ and $p_2 = 0.7$, and the way to minimize the joint entropy is to make $X_1 = 1$ iff $s \leq p_1$ and make $X_2 = 1$ iff $1 - s \leq p_2$. On the other hand, one can easily see that the above way does minimize the joint entropy when all the $p_i$'s are identical. What would be the conditions when the above way achieves minimum?