How can I solve this optimization problem - what concrete math algorithm should I use?

151 Views Asked by At

I need to run the following optimization problem in mobile phones. Therefore:

  • Need to run within 0.1 seconds using mobile phone's CPU (you know, kind of weak cpu)
  • Need the installation file to be small, so, for example, cannot package a whole cvx package into my app.

Since I am new to optimization, I do not know what should I do. Thanks for any suggestions!


EDIT:

Indeed I am a programmer, so the question is not about what concrete code to write, but instead about what algorithm/solver should I use. For example, should I use gradient descent? or ADMM (just an example, I know I should not use that)? or something else?

What I guess I should use: Maybe I can write down a optimization algorithm by hand? (For example, for the simplest problem, maybe just write down the gradient manually, then write down gradient descent manually within 10 lines of code). But I do not know what algorithm should I use (seems not "naive" gradient descent). Maybe I can use a optimization package? But is that an overkill, since I only need to optimize this one problem?


Problem details:

$$ \begin{aligned} \arg \min_s \ \ & k_1 \left( \max_{i \in {1...N}} \left( s_i + a_i \right) - \min_{i \in {1...N}} \left( s_i + b_i \right) \right) \\ & k_2 \left( \max_{i \in 1...l} (s_i + a_i) - \min_{i \in 1...l} (s_i + b_i) \right) + \\ & k_3 \left( \sum_{i=2}^N |s_i - s_{i-1}|_1 \right) + \\ & k_4 \left( \sum_{i=2}^{N-1} |2 s_i-s_{i-1}-s_{i+1}|_1^{1.5} \right) \\ s.t.\ \ & 0 \leq s_i \leq H \\ \text{where} \\ & a,b,s \in \mathbb{R}^N \\ & l, N \in \mathbb{N}, \;\; l < N \\ & k_1, k_2, k_3, k_4: \text{weights} \\ & H: \text{a big enough number that will not affect solution} \\ \end{aligned} $$

Explanations:

  • The row of "k2": Similar to the row of "k1", except that only operate with element 1 up to $l$, and ignore the rest elements.
  • The row of "k3": As if slope of $s$.
  • The row of "k4": As if curvature of $s$.

Things that can be changed:

  • The 1.5-th power in row of "k4" can be changed to 2nd power.
  • $0\leq s_i \leq H$ can be modified or deleted

Range:

  • $N \approx 1000$
  • $H \approx 200$
  • $0 \leq a_i \leq H$, $0 \leq b_i \leq H$
2

There are 2 best solutions below

7
On

This will be much easier to solve efficiently if you're willing to change your objective a bit. Objectives with absolute values don't play nicely with basically anything else, while convex quadratic objectives with convex domains are easy to solve. I would suggest turning the first two parts of your problem into constraints (i.e. say $\max \{s_i+a_i\} \le c_1$ and $\min\{s_i +b_i\} \ge c_2$, which turns into $s_i\le c_1-a_i$ and $s_i \ge c_2-b_i$), turn the $|s_i-s_{i-1}|$ into an $(s_i-s_{i-1})^2$, and solve the constrained convex quadratic optimisation problem. Then re-run it all for a few different values of $c_1$ and $c_2$.

Edit: Another idea, you could split the problem into an $i \le l$ and an $i \ge l+1$ sub-problem (if you're able to replace $k_1\max\{s_i+a_i:1 \le i \le n\}+k_2\max\{s_i+a_i:1 \le i \le l\}$ with $(k_1+k_2)\max\{s_i+a_i:1 \le i \le l\}+k_1\max\{s_i+a_i:l+1 \le i \le n\}$). Then you could use a binary search to find a local minimum for $c_3-c_4$ and for $c_1-c_2$ separately. You could come up with some heuristic for the gradient at the crossover point (maybe just minimise $k_4((s_{l-1}-s_l)^2+(s_l-s_{l+1})^2)$, or work out the average gradient $c$ of $\frac{a_i+b_i}{2}$ and minimise $k_4((s_l-s_{l-1}-c)^2+(s_{l+1}-s_l)^2)$).

2
On

Disclaimer

Gradient descent methods are not necessarily very fast. This case, as pointed out by 1Rock, may be better handled by using a simpler problem which has better methods. A reasonable heuristic solution can also be constructed, which in many cases might even be "good enough". In any case, some general tips and an implementation in Python are presented below.

Heuristics

We can do better than a randomized initial guess. We know that the $k_3$ and $k_4$ portions imply most parameters should be roughly equal. We know that the $k_1$ and $k_2$ portions imply parameters with large $a_i$ should be as small as possible, and parameters with small $b_i$ should be as large as possible. In the implementation below, I use an initial heuristic of $s_i=H/2$ for every component and then increase/decrease each $s_i$ based on $a_i$ and $b_i$.

Constraints

Gradient descent can be somewhat messy when constraints are involved. However, this can be easily handled by mapping an unconstrained space to the constrained one. In this particular example, I used: $$\operatorname{constraint\_map}(x,l,h)=l+\frac h{1+\exp(x)}$$ which maps $x$ into the interval $(l,h)$.

Gradient?

In your case, we have a rather messy piecewise gradient. If you wish to sit down and try to construct the exact gradient, then feel free to. However, since small changes in $\vec s$ produces small changes in your objective function, SPSA may be used instead to approximate the gradient using only the objective function. Often times, even in high-dimensional problems, SPSA actually works quite well enough.

Implementation

You may find a small implementation in Python on GitHub. The main portion of the algorithm is as follows:

  1. Produce a perturbation $dx$ using $px$.
  2. Estimate the gradient $gx$ using $dx$.
  3. Update the solution using $x_{n+1}=x_n-lr\cdot gx$.

On top of this, the following is included:

  • Momentum is used to make iterations more stable.
  • Cycles are used to refresh the momentum as well as tune the learning rate.
  • The learning rate is tuned initially and at every cycle using line search.

Remarks

The code definitely works, but it does not meet the desired constraints (0.1 runtime on a weak CPU). I would suggest trying to find better heuristics (e.g. by using smaller $N$ and seeing if you can come up with better ideas than I did) and do a touch of hyperparameter tuning (play around with the numbers to potentially get better results).