Markov Chain Monte Carlo thermolization time estimation (not by eye)

370 Views Asked by At

For a given MCMC algorithm, there are two important time(=step) scale.

  1. $\tau_{thermolization}$ also known as burn-in time, intialization time.
  2. $\tau_{indenpendent}$ the time scale to make $X_i$ and $X_{i+\tau_{indenpendent}}$ independent from each other

$\tau_{indenpendent}$ can be extract from the auto-correlaton function of certain random variable $$C(t):=\langle X_i X_{i+t}\rangle -\langle X_i \rangle^2 \sim e^{-t/\tau_{indenpendent}}$$

For $\tau_{thermolization}$, I don't see any explicit definition of it. Usually, I would plot a curve of $X_i$ , and then truncate at a location where the series looks "steady".

My goal is to write a program, which gives me estimation of $\tau_{thermolization}$ and $\tau_{indenpendent}$ before a major Monte Carlo simulation.

So I need an expression or algorithm for $\tau_{thermolization}$ , not by eye judgement.

1

There are 1 best solutions below

0
On

Coming from the molecular simulation community, where MCMC is heavily used along with molecular dynamics, to sample complex configurations of interacting molecules, I agree that this aspect is often glossed over and left as a "rule of thumb". However I'm aware of at least one paper where an attempt has been made, by John Chodera, to tackle it objectively: https://www.biorxiv.org/content/early/2015/07/04/021659 which was also published in J Chem Theo Comp, 12, 1799 (2016). The author also provides a piece of Python software to implement the calculation automatically. Maybe this is what you need.

I won't try to reproduce the mathematics here, but the basic idea seems to be to use the procedure for estimating statistical errors in correlated sequences of data - which involves estimating the correlation time (or the statistical inefficiency), the interval between effectively independent samples, as you mention in your question - and applying it to the interval $(t_0,t_{\text{max}})$ which covers the period between the (proposed) end of the equilibration period and the end of the whole dataset. This calculation behaves in a predictable way if the dataset is at equilibrium; the method systematically carries it out as a function of $t_0$, reducing it from an initial high value, and attempts to detect the value of $t_0$ at which deviations from this expected behaviour are seen.

Anyway, hopefully that paper will be helpful.