Minimize the entropy with zero mean and given standard deviation

149 Views Asked by At

Which nonnegative distribution function on the real axis with zero mean and given standard deviation minimizes the entropy? In other words \begin{align} \min_{f\in \mathcal D'(R)} H[f] &:= -\int_{-\infty}^\infty f(x)\ln f(x)\,dx, \\ \text{with} & \\ &f(x)\ge0, \\ & \int_{-\infty}^\infty f(x)\,dx=1, \\ & \int_{-\infty}^\infty xf(x)\,dx=0, \\ & \int_{-\infty}^\infty x^2f(x)\,dx = \sigma^2. \end{align} where positive $\sigma$ is given.

My conjecture is that the minimizing distribution is $f_m(x) = \frac12(\delta(x-\sigma)+\delta(x+\sigma))$. How would one rigorous prove this, or give a counterexample?


I do not think the calculus of variation works here, unlike when maximizing the entropy. It is a corner rather than stationary point.

2

There are 2 best solutions below

5
On

I think you're right that the usual ways to check optimality would fail here. This may not be completely rigorous, but here is one line of thinking...

Let's assume $f_m$ minimizes $S[f]$. One way to check is to show no local perturbation of this function will yield a smaller entropy subject to your constraints.

Your proposed solution is a discrete measure defined on $\pm \sigma$:

$$\mu = \frac12 \sum_{a \in \{\pm \sigma\}} \delta_a$$

$$S[f_m] = -\frac12\int_R \Big(\delta(x-\sigma)+\delta(x+\sigma)\Big)\ln\left(\frac12\delta(x-\sigma)+\delta(x+\sigma)\right)dx =-\int_R \ln\left(\frac12\right)d\mu$$

$$ = -\ln\left(\frac12\right) +\int_{R\setminus\{\pm\sigma\}}\ln(0)d\mu = -\ln\left(\frac12\right) + Q$$

What is $Q$? Thanks to measure theory, we can see it will be $0$ (see this and this) so...

$$S[f_m] = -\ln\left(\frac12\right)\approx .69$$

Can we do better? What if we slightly adjusted the measure:

$$\mu_{d,s} = \frac{d}{2}\left[\delta_{-s}+\delta_{s}\right]+(1-d)\delta_0\;\; 0\leq d \in (0,1), s\geq\sigma$$

For a given $\mu_{d,s}$, the variance will be $$\sum_{k \in \{\pm s, 0\}}p_kk^2 = \frac{d}{2}(-s)^2+\frac{d}{2}s^2 + (1-d)\cdot 0^2 = ds^2$$

Given that we need to keep the variance stable at $\sigma$ we can relates $s$ to the amount of probability we allocate to the non-zero atoms.

$$d\cdot s^2=\sigma^2 \implies s = \sigma\sqrt{\frac{1}{d}}$$

Now our integral shakes out to be:

$$S[f_{m,s}] = -\left[d\ln\left(\frac12 d\right)+ \left(1-d\right)\ln\left(1-d\right)\right]$$

This function is maximized at $d=\frac{2}{3} \implies H \approx 1.1,\;s=\sigma\sqrt{\frac{3}{2}} \approx 1.22$ with $\frac13$ placed at zero. So it's just a discrete distribution uniformly placing probability at $-s,0,s$

The value of this function is minimized as you approach $0$ from the right to get $\lim_{d\to 0^+} H = 0$ (as also noted in the above answer -- much more succinctly than my answer ;-P)

Therefore, your solution at $H=0.69$ is somewhere in the middle of the pack. For example, setting $d=0.1$

$$f_m=\frac{0.1}{2}\Big[\delta(x+\sigma\sqrt{10}) + \delta(x-\sigma\sqrt{10})\Big] + 0.9\delta(x) \implies H \approx 0.39 < 0.69$$

0
On

The conjecture is false. There is no minimum. But there is an infimum which is $0$. For any $p\in (0,1)$, pick $x_1=\sigma\sqrt{\frac{1-p}p}$ and $x_2=\sigma\sqrt{\frac p{1-p}}$. Assign probability $p$ to $-x_1$ and $1-p$ to $x_2$. $H[p]\rightarrow 0^+$ as $p\rightarrow 0^+$.


Motivation (It is not rigorous. I will refine it later.):

For two sigma algebras $\Sigma_1\subset \Sigma_2$ on the real axis, $$H(\Sigma_2)=H(\Sigma_1)+H(\Sigma_2|\Sigma_1).$$ The coarser sigma algebra gives lower entropy, while we can always arrange the random variable value on the sigma algebra so that it satisfies the two moment constraints. The coarsest sigma algebra that still obeys two constraints is the discrete partition characterized by a random variable with two distinct values.