State-of-the-art techniques for maximizing a differentiable function with multiple local maxima

149 Views Asked by At

Suppose I want to find the global maximum of a differentiable function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ with multiple local maxima, where $n$ is on the order of $100$. I have access to both $f$ and its gradients $\nabla f$. What are the state-of-the-art techniques for solving this problem? Gradient descent alone does not suffice for my problem because it gets stuck in local maxima.

1

There are 1 best solutions below

3
On

this problem is likely unsolvable, sorry :)

Let's say you are looking for a global maximum in a box of length 1 and dimension $m=100$. Further, let's even assume that you know the value of that maximum $F_{\max}$, you just don't know where it is. Let's also assume that the absolute values of all derivatives $|\nabla F|$ are upper-bounded by some number $D$. Whenever you sample one point from the function you get information about whether the maximum might be located within a certain patch localized around the sample point. The exact size and shape of the patch would depend on the value of the function, value of the gradient and $D$. Thing is, you don't get any information on what happens outside that patch. So, in order to guarantee finding the maximum, you must have sufficient patches to cover the entire volume of the cube. The number of patches you need grows exponentially with dimension, so even if you need 2 patches per dimension, you still need $2^{100} \approx 10^{30}$ patches, which is not computable on a regular hardware. Further concerns:

  • If you don't know $F_{max}$ or $D$ or the size of the bounding box, you can't even estimate the minimal number of evaluations necessary to guarantee finding maximum.
  • You are not only limited by time, but also by storage. Given sufficient problem size remembering what parts of the domain have already been explored becomes an issue.

At least in astronomy, I have seen people use random search (e.g. MCMC) to get the probability distribution of different values occuring in the domain, and then use some prior knowledge to extrapolate the probability distribution to make an educated guess of what the maximum could be. No surprise, depending on prior assumptions and luck, they sometimes are wrong by a few orders of magnitude, but it is not their fault :D