Why do first and second moments insure stability?

273 Views Asked by At

In Pattern Recognition and Machine Learning, Bishop states

"Let us now consider the maximum entropy configuration for a continuous variable. In order for this maximum to be well defined, it will be necessary to constrain the first and second moments of p(x) as well as preserving the normalization constraint"

Why is it the case?

Entropy:

continuous entroy

Moments: CE moments

2

There are 2 best solutions below

4
On BEST ANSWER

Consider the uniform distribution on the interval $[-x,x], x \ge 1$ . It's entropy is $ln(2x)$ . When x go to infinity , the entropy go to infinity as well, therefore the maximum is not well defined .

3
On

Many continuous distributions have infinite entropy. So it makes sense to impose some restrictions before asking which distribution has the largest entropy. For example, if we just required the zeroth moment be 1: $\int^\infty_{-\infty} p(x)dx=1$, $$\text{H}[\text{Unif}(x|a,b)]=\ln(b-a)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{(@PopescuClaudiu)}$$ and $$\text{H}[\mathcal{N}(y|\mu, \sigma^2)]={1\over{2}} \{1+\ln(2\pi \sigma^2)\} \ \ \ \ (\text{Bishop eqn. 1.110})$$ can each be made as large as desired by increasing $b-a$ or $\sigma^2$ respectively.

The continuous distribution having the largest entropy on a fixed domain, $x\in[a,b]$, can be found by maximizing $$-\int^b_a p(x)\ln(p(x))dx + \lambda\bigg(\int^b_a p(x)dx-1\bigg)$$ wrt $p(x)$: $$-\ln p(x)-1+\lambda=0$$ $$p(x)=\exp(\lambda-1)=c$$ $$\int^b_a p(x)dx=1\Rightarrow p(x)={1\over{b-a}}$$ as pointed out in the comment by @PopescuClaudiu.

The continuous distribution having the largest entropy and having first and second moments of $\mu={b+a\over2}$, and $\sigma^2={(b-a)^2\over{12}}$ is $$p(x)={1\over(2\pi\sigma^2)^{1/2}}\exp\left\{-{(x-\mu)^2\over2\sigma^2}\right\}$$ as derived in Bishop PRML p.54. Note that each of these distributions have the same $0^{th}, 1^{st}$ and $2^{nd}$ moments, and the Gaussian has a larger entropy $${1\over{2}} \{1+\ln(2\pi \sigma^2)\}= {1\over{2}} \{1+\ln({\pi\over6})+2\ln(b-a)\}>\ln(b-a)$$

A whole collection of possible constraints and corresponding entropy maximizing distributions is given here.