In the discrete case (or bounded support?) why the distribution maximising the entropy is uniform while it is the gaussian in the class of fixed std?

53 Views Asked by At

Let $X$ be a random variable with a probability density function $f$ whose support is a set $\mathcal{X}$. The entropy $h(X)$ or $h(f)$ is defined as

$$ H(X)=\mathrm{E}[-\log (f(X))]=-\int_{\mathcal{X}} f(x) \log f(x) d x $$

Consider $dx$, the counting or Lebesgue measure.

From T. M. Cover, J. A. Thomas - Elements of Information Theory (2006) :

We have for the uniform distribution the following:

Theorem 2.6.4 $H(X) \leq \log |\mathcal{X}|$, where $|\mathcal{X}|$ denotes the number of elements in the range of $X$, with equality if and only $X$ has a uniform distribution over $\mathcal{X}$.

Proof: Let $u(x)=\frac{1}{|\mathcal{X}|}$ be the uniform probability mass function over $\mathcal{X}$, and let $p(x)$ be the probability mass function for $X$. Then $$ D(p \| u)=\sum p(x) \log \frac{p(x)}{u(x)}=\log |\mathcal{X}|-H(X) . $$ Hence by the nonnegativity of relative entropy, $$ 0 \leq D(p \| u)=\log |\mathcal{X}|-H(X) $$

We have for the normal distribution the following:

Theorem 8.6.5 Let the random vector $\mathbf{X} \in \mathbf{R}^n$ have zero mean and covariance $K=E \mathbf{X X}^t$ (i.e., $K_{i j}=E X_i X_j, 1 \leq i, j \leq n$ ). Then $h(\mathbf{X}) \leq$ $\frac{1}{2} \log (2 \pi e)^n|K|$, with equality iff $\mathbf{X} \sim \mathcal{N}(0, K)$.

Does Theroem 2.6.4 holds also for continuous uniform distributions ? (bounded support)

The Theorem 8.6.5 by letting $\mathbf{R}^n$ be the support, increase the set of possible distributions still intuitively I would think the continuous uniform distribution to be the extension of the discrete case ?

1

There are 1 best solutions below

0
On

Yes, Theorem 2.6.4 holds more generally. It's known as Gibbs' inequality, and is true e.g. for the continuous uniform distribution on any set of finite measure in $\mathbb R^n$.

Theorem 8.6.5 is different in that it fixes the covariance $K$. If $K$ was allowed to vary, there would be no maximum entropy distribution; you can check that e.g. for $X \sim N(0, \sigma^2I)$, the entropy $h(X) \to \infty$ as $\sigma^2 \to \infty$.

Asking what maximises entropy for a fixed covariance $K$ is a fundamentally different question than the unrestricted maximum, which is why you get a qualitatively different result. Even if the support was bounded, fixing $K$ would give you a Gaussian-like distribution, rather than a uniform one.