Good convergence criterion for stochastic optimization?

437 Views Asked by At

This is a question that has bothered me quite long, as I have faced it many different optimization and equation solving problems.

The basic idea is that one wishes to minimize $F(x)$ and has one method that produces a sequence $\{x_n\}$ that will converge to a neighborhood of a local minimum. Unfortunately such method is not monotonic, and $\{F(x_n)\}$ converges but is not strictly decreasing. The question is which would be a good quantitative criterion to test convergence based solely on the values of $F(x_n)$.

Examples of this situations are Montecarlo methods, but also some numerical methods for variational approximation of the energy of quantum states, where the "$x_n$" live in a reduced variational subspace. Thus, in the first case the non-monotonicity comes from the stochastic algorithm itself, while in the second case it arises from projecting the gradient into the manyfold in which the variational state lives. In many of these cases the sequences that are produced look as $$F_n = F_{min} + \exp(-n/\xi) + \epsilon_n,$$ where $\xi$ is some unknown convergence time and $\epsilon_n$ is some noise whose magnitude also cannot be known in advance.

Sometimes I have used a simple criterion of stopping the algorithm when $F(x_{n+1}) >= F(x_{n})$, but this has the disadvantadge that quite often the real minimum $F_{min}$ is not reached (due to the fluctuations mentioned before).

Note: I have also tried studying the mobile average of $x_n$, but this is quite often not useful for many situations in which many $x_n$ are valid, such as when there are rotational symmetries or more complex situations, which is why I prefer to focus on studying the value of the optimized function itself.

Note 2: Sometimes one does not seek maxima or minima, but simply convergence of some quantity, which behaves very similarly but now one does not know whether it increases or decreases $$F_n = F_{min} \pm \exp(-n/\xi) + \epsilon_n.$$