Gibbs measure is concentrated in the set of global minima

158 Views Asked by At

So I was reading Chii-Ruey Hwang's paper called "Laplace's Method Revisited: Weak Convergence of Probability Measures"

I will sort of give the basic premise of the paper:

Let $Q$ be a fixed probability measure on $\mathbf{R}^n$ and $H$ be a continuous function on $\mathbf{R}^n$ s.t.

$$a \gt \inf_x H(x) \implies Q\{H(x) < a\} \gt 0$$

Let $N$ be the set of minimal states in $H$ , i.e.

$$N = \{x | H(x) = \inf_y H(y) \} $$

He goes on to describe a probability measure on the set $N$. In order to do this he defines the measures $P_\theta$ as

$$\frac{dP_\theta}{dQ}(x) = \frac{\exp{(-H(x)/\theta)}}{\int \exp{(-H(x)/\theta)}dQ(x)}$$

He goes on to claim that if $P_\theta \rightarrow P$ weakly as $\theta \rightarrow 0$ then $P$ is the required measure on the set $N$. He presents this as a corollary. The theorem he proves is the following:

Theorem: If $H$ does not have a minimum then $P_\theta$ is not tight.

He then makes an assumption that the minimum of H(x) exists and is 0. Under this assumption he gives the following corollary

If $P_\theta \rightarrow P$ weakly then $P$ concentrates on $N$.

The corollary is presented without proof.

Any idea on the motivation of the construction of such measures and why the corollary holds true from the theorem will be helpful. Thanks

EDIT If you think the proof of the theorem might help let me know I will post it. Apparently this has deep ties with statistical mechanics and in a lot of places people have used it directly. I dont see how this is so obvious

EDIT So I guess as equivalent question would be to sure why the Gibbs distribution concentrates about the set of minimas. As I found out these are basis of simulated annealing techniques.

1

There are 1 best solutions below

1
On BEST ANSWER

For the first part, whuch uses the theorem Since $P_\theta$ converges weakly to $P$ for any $\theta$ that converges to $0$ it implies that the class of distribution $\{P_\theta\}$ is tight and hence from the theorem it implies that $H(x)$ has a minimum which is $0$ by assumption.

The rest is just showing $P_\theta$ as $\theta$ goes down to $0$ concentrates around the set of minima.

$$dP_\theta(x) = \frac{\exp{(-H(x)/\theta)}dQ(x)}{\int \exp{(-H(x)/\theta)}dQ(x)}$$ $$ = \frac{\exp{(-H(x)/\theta)}dQ(x)}{Q(N) + \int_{\mathbf{R}^n/N} \exp{(-H(x)/\theta)}dQ(x)}$$ as on $N$ the exponent is $1$ (Recall min H(x) is 0 by assumption). The denominator second term, as $H(x) > 0$ on $\mathbf{R}^n/N$, and therefore goes to $0$ as $\theta$ goes to $0$. The numerator term is $1$ if $H(x) = 0$ for all $\theta$ or else it goes to $0$ as $\theta$ goes to $0$. thus $$P(x) = \frac{1}{Q(N)}\mathbf{1}[x \in N]$$

And thus we are done.