Outer approximation algorithm for non-convex integer nonlinear program

20 Views Asked by At

Consider a particular non-convex binary nonlinear problem of the following form: $$ \min_X f(X) \\ \text{s.t. } X = (x_1, \ldots, x_L)^T \in \{0, 1\}^{L \times V } \\ \sum_{j=1}^V X_{ij} = \sum_{j=1}^V [x_i]_{j} = 1, \quad \forall i=1,\ldots,L $$ Where the function $f$ is once continuously differentiable, nonlinear and non-convex. The approximate problem sizes are $L=20$, $V=30000$. So the problem is basically about choosing a sequence of elements from a large vocabulary of size V. It is simpler than the general non-convex MINLP, since there are no continuous variables, thus no continuous constraints. Also, it can be modelled using $L$ integer-constrained variables (since only one binary variable in $x_i$ is allowed to be non-zero). But then its continuous relaxation is not interpretable, thus not easy to compute (there is no order in the vocabulary, so neighboring integers do not correspond to neighboring elements of vocabulary).

I am trying to adapt an outer approximation algorithms, similar to the ones in DICOPT and AIMMS. These algorithms do not guarantee global optimality, but might quickly find good feasible solutions (locally optimal). They sequentially linearize the nonlinear functions using gradient cuts, and solve MILPs.

The main issue is that I do not know how to handle linear cuts of the objective function, where the cuts are not correct underestimations of the function due to its non-convexity.

In each iteration I solve the following MILP, with the solutions from previous iterates given as $\{{X}^k\}_{k=0}^{K-1}$:

$$ (X^K, \alpha^K, r^K) = \arg\min_{X, \alpha, r} \alpha + \pi^T r \\ \text{s.t. } X = (x_1, \ldots, x_L)^T \in \{0, 1\}^{L \times V } \\ \sum_{j=1}^V X_{ij} = \sum_{j=1}^V [x_i]_{j} = 1, \quad \forall i=1,\ldots,L \\ f(X^k) + \nabla f(X^k)^T (X-X^k) \leq \alpha + r_k, \quad \forall k=0,\ldots,K-1 \\ r_k \geq 0, \quad \forall k=0,\ldots,K-1 \\ ||X - X^k||_1 \geq 1, \quad \forall k=0,\ldots,K-1 $$ where $\pi$ is a fixed vector of penalty multipliers, and the last constraint ensures that MILP finds new integer solutions.

  1. Do I need to allow the relaxation of incorrect cuts using additional variables $r_k$ and penalizing it in the objective? Or do those relaxations occur internally since the cuts of the objective function are not strict (can be relaxed away by $\alpha$)? Are there better ways to handle the non-convexities of the objective function?

  2. Generally, which algorithms are the most effective for this problem? The exact ones based on Branch-and-Bound might be difficult to implement because of (1) the problem size, (2) there are no good continuous NLP solvers, nor given lower bound approximators of function $f$. Treat function $f$ as a black-box function, which also gives gradients. Plus, the exact solution is not necessary. Solutions with $f(X^*) \leq t $ are sufficient.