Low effort question incoming. Given a set of states $x_i,i=1,\dots,n$ with energy $0\leq U(x_i)$, we define the probability of a state $x$ as $$ \pi(x)=\frac{1}{Z_T}e^{-\frac{1}{T} U(x)} $$ where $Z_T$ is just a normalization constant. According to the Wikipedia this distribution is the maximum entropy distribution on the set of states for a given mean (so I guess you can tune $T$ to get any mean). Then I've seen this in Wikipedia https://en.wikipedia.org/wiki/Superstatistics where they define $\beta=\frac{1}{T}$ and given a distribution $f(\beta)$ of the inverse temperature on $(0,\infty)$ they sort of extend the definition to $$ \pi(x)=\frac{1}{Z_\beta}\int_{0}^\infty d\beta f(\beta)e^{-\beta U(x)} $$ I don't really know if this theorem https://en.wikipedia.org/wiki/Bernstein's_theorem_on_monotone_functions goes in the direction of expected value of exponentials implies complete monotonicity. I have two sort of informal questions:
- What is it with exponentials and this expected exponentials? What would be lost by picking a function $x<y \Rightarrow f_T(x)<f_T(y)$ and define $\pi(x)\propto f_T(U(x))$? Is complete monotonicity somehow playing an important hidden role?
- Why do people use the Boltzmann distribution for MCMC in the TSP problem for example? Why not any other function that makes a delta at the minimum when $T\rightarrow 0$? Is there some kind of optimal convergence?