Global optimization of a well-defined function with gradient information

269 Views Asked by At

I try to minimize the function

$$ f(x_1, …x_n)=\sum\limits_{i}^n-a_icos(4(x_i-b_i)) +\sum\limits_{ij}^{edge}- cos(4(x_i-x_j)) $$

$$x_i,b_i\in (-\pi, \pi)$$

where $$\sum\limits_{ij}^{edge}$$ only sums over predefined edges (the edges are sparse, typical there are 4 edges for each i). $a_i, b_i$ are constants, $ a_i \in (0,1)$,many $a_i$ equals to $0$.

The gradient at each iteration can be directly computed. I find Gradient descent converges very fast but produces very bad results (very easily falls into local optimal). I have no idea whether Conjugate Gradient Algorithm or BFGS algorithm will make a significant difference.

While general optimization such as simulated annealing and GA do not use the gradient information (they seem powerful but blind to local derivative), thus I am looking for techniques making good use of gradient information.

3

There are 3 best solutions below

0
On

If you approximate your domain by a fine-mesh lattice, then you can apply Tabu Search to find the global optimum using only (well, almost only) local information. This agorithm works well in discrete, combinatorial problems, so you can achieve this by using domain lattice. Once you do that, you are all set to apply this algorithm.

Note: There is no singe Tabu search - its a whole class of searches that basically prevent re-visiting previously visited parts of the domain. You can adjust the length of the memory and strength of the prohibition.

0
On

When I experiment with a standard local nonlinear solver such as fmincon in MATLAB, and benchmark it against the guaranteed global solver bmibnb available in the MATLAB toolbox YALMIP (disclaimer: developed by me), my impression is that the local nonlinear solver typically finds a solution within a few percent from global optimality. I've only tried $n=100$ as the global solver gets really slow when dimensions grow larger, but up until there it is fairly consistent

n = 100;
a = sprand(n,1,.5);
b = -pi+2*rand(n,1);
[i,j] = find(sprandn(n,n,4/n))
x = sdpvar(n,1);
Objective = -sum(a'*cos(4*(x-b)))-sum(cos(4*(x(i)-x(j))));
ops = sdpsettings('solver','bmibnb','bmibnb.relgaptol',0.05)
optimize([-pi <= x <= pi], Objective,ops)

* Starting YALMIP global branch & bound.
* Branch-variables : 430
* Upper solver     : fmincon
* Lower solver     : GUROBI
* LP solver        : GUROBI
 Node       Upper      Gap(%)       Lower    Open
    1 :   -3.962E+02     3.48     -4.102E+02   2  Improved solution  
* Finished.  Cost: -396.1672 Gap: 3.48

Solving the same problem using fmincon recovers the -396.2 solution which the global solver found (not too surprising, as the global solver starts with a local search, albeit on a severly preprocessed model), hence fmincon when used alone is at most 3.5% from global optimality, on this instance. This behaviour seems to be the same regardless of algorithm used in fmincon.

ops = sdpsettings('solver','fmincon','fmincon.algorithm','active-set');
optimize([-pi <= x <= pi], Objective,options)
ops = sdpsettings('solver','fmincon','fmincon.algorithm','sqp');
optimize([-pi <= x <= pi], Objective,options)
ops = sdpsettings('solver','fmincon','fmincon.algorithm','interior-point');
optimize([-pi <= x <= pi], Objective,options)

Hence, if you have a correctly implemented algorithm which has some second-order approximate knowledge, it should perform fairly well.

5
On

Thank Bey and Johan! I would like to have a look at Tabu search and tools in MATLAB. Today I tried quadratic program and it works! First, the energy function is defined as (a little bit from the original one): $$ f(x_1, …x_n)=\sum\limits_{i}^n(x_i-m_i)^2 +\sum\limits_{ij}^{edge}(x_i-y_i)^2$$ Let $x=\{ x_1, …x_n \}$ we get a quadratic form: $$ f(x_1, …x_n)= \frac{1}{2}x^TAx-b^Tx $$ Which can be solved by Ax=b, by one shot instead of iterative process! I programmed it in Java and it gives the results in the following image(the red points are $m_i$). Here $n=2156$, so matrix $A$ is $2156\times2156$. It takes several seconds to compute its inverse to solve the system. enter image description here

while, the 'sharp' peak at each $m_i$ is not desirable, I guess modifying the energy function would change the curvature at the peaks.