Efficient solution to simple linear program for high dimensions

789 Views Asked by At

Let $\eta, \delta, v\in \mathbb{R}^N$. I am trying to optimize

$$\max_{\eta} \eta^\top \delta\quad\text{s.t.}\quad \eta_i\in [a_i, b_i] \,\,\wedge\,\,v^\top\eta = 0$$

quickly (ideally < 1s) in high-dimensions (N > 1e6). One brute-force solution would be to use specialised interior point optimisers (e.g. IPOPT or GUROBI), but such packages are hard to distribute in an open-source package to a wider user base (installation can be challenging). Alternatives such as CVXOPT or Scipy fail miserably according to some initial experiments. Another option would be to perform a projected gradient descent, but this adds additional hyperparameters and can take time to converge.

I'd very much appreciate a more clever (and efficient) solution, if any exists.

2

There are 2 best solutions below

1
On BEST ANSWER

Yes, you can solve this efficiently (linear in the number of variables). For simplicity, I'll just outline an algorithm and I'll switch notations to something more familiar to me. First, note we can shift and rescale variables so that they're all constrained to the unit interval: $$ \begin{gather*} \min_x c^T x \\ 0 \le x \le 1 \\ a^T x = b \end{gather*} $$ The corresponding dual problem is: $$ \begin{gather*} \max_{\nu,\lambda_1,\lambda_2} -1^T \lambda_2 - b \nu \\ \lambda_1 \ge 0, \lambda_2 \ge 0 \\ \lambda_1 - \lambda_2 =c + \nu a \end{gather*} $$ The solution to the primal can be recovered from the dual since the KKT conditions imply: $$ x^*_i = \begin{cases} 0 & c_i + \nu^* a_i > 0 \\ 1 & c_i + \nu^* a_i < 0 \end{cases} $$ The remaining coordinates where $c_i + \nu^* a_i = 0$ can then be chosen to be any primal feasible values.

To solve the dual problem, we can optimize out $\lambda_1$ and $\lambda_2$, resulting in a 1-dimensional problem. $$ \max_{\nu} g(\nu) := -b \nu + \sum_i \min(0, c_i + \nu a_i) $$ This function $g$ is piecewise linear with breakpoints $-c_i/a_i$. So assuming the problem is primal feasible and the dual is bounded, the optimium must occur at one of the breakpoints. So we can reduce the problem to finding the maximum over a finite set of values: $$ j^* = {\mathop{\operatorname{arg max}}_j} g\left(-c_j/a_j\right) \Rightarrow \nu^* = -c_{j^*}/a_{j^*} $$ Naively evaluating $g$ at each of the breakpoints will take $O(n^2)$ time since there are $n$ breakpoints and evaluating $g$ takes $O(n)$ time. Instead it's better to sort the breakpoints and keep track of partial sums to reduce the cost of evaluating $g$ to $O(1)$. This results in an $O(n \log n)$ algorithm due to the sort. Furthermore, if you want to get fancy, you can do a variation of quickselect on the breakpoints to get an $O(n)$ algorithm.

Addendum: At the end of the day you should get an algorithm like this (using your notation). Initially set $\eta_i = \begin{cases} a_i & \delta_i < 0 \\ b_i & \delta_i > 0 \end{cases}$. This is the solution to the problem without the constraint. Then:

  • If $\eta^Tv=0$ you're done.
  • If $\eta^Tv>0$, then sort the coordinates by $v_i/\delta_i$ in descending order, and change the values $\eta_i$ one at a time (decreasing both $\eta^Tv$ and $\eta^T\delta$) until you've satisfied the constraint $\eta^Tv=0$. If you reach a coordinate where $v_i/\delta_i \le 0$, the problem is infeasible.
  • If $\eta^Tv<0$, then sort the coordinates by $v_i/\delta_i$ in ascending order, and change the values $\eta_i$ one at a time (increasing $\eta^Tv$ and decreasing $\eta^T\delta$) until you've satisfied the constraint $\eta^Tv=0$. If you reach a coordinate where $v_i/\delta_i \ge 0$, the problem is infeasible.
0
On

The following is just conjecture; I have not actually experimented with it. (Also, check the math: the caffeine level in my bloodstream is dangerously low.) What about Lagrangean relaxation?

Let $S=\prod_{i=1}^{N}[a_{i,}b_{i}]$ and let $$f(\eta;\mu)=\max_{\eta\in S}\left(\delta^{\top}\eta-\mu\cdot v^{\top}\eta\right)=\max_{\eta\in S}r(\mu)^{\top}\eta$$ where $\mu\in\mathbb{R}$ and $r(\mu)=\delta-\mu\cdot v$. The LR problem is $\min_{\mu\in\mathbb{R}} \max_{\eta\in S} f(\eta;\mu)$.

For fixed $\mu$, solving the inner problem is (almost) trivial: $$\eta_{j}=\begin{cases} a_{j} & r_{j}<0\\ ? & r_{j}=0\\ b_{j} & r_{j}>0 \end{cases}.$$ The question mark indicates any value you like within $[a_j,b_j]$. The "almost" is because rounding error needs to be dealt with.

Given a trial multiplier value $\hat{\mu}$ with corresponding solution $\hat{\eta}$ as given above, the partial derivative of $f()$ with respect to $\mu$ there is $$\frac{\partial}{\partial\mu}f(\hat{\eta};\hat{\mu})=-v^{\top}\hat{\eta}.$$ So you want to adjust $\mu$ in the direction $v^{\top}\hat{\eta}$. Take a step in that direction or do a line search in that direction, shrink step sizes occasionally, etc.

I can't tell you how quickly it will converge, but it should parallelize nicely.

Edit: It occurred to me, belatedly, that since $\mu$ is a scalar variable, you could optimize it via Golden Section search pretty easily (again, exploiting the fact that all the computation parallelize nicely, and assuming you have a bunch of cores to throw at the problem).

Also, I was a bit cavalier about the case $r_j=0$. The value chosen for $\eta_j$ does not matter prior to converge at the optimal value of $\mu$, and in fact you may well not see $r_j=0$ during the search. Once you've found the optimal multiplier $\mu^*$ (to within tolerance), you find compute $r(\mu^*)$, find the index $k$ where $r(\mu^*)_k=0$ (again, to within tolerance), set $\eta_j$ to the bound indicated for $j\neq k$, and use $\eta_k$ to balance the equation ($\eta_k = -(\sum_{j\neq k} v_j \eta_j)/v_k$).