Algorithms For Large-Scale $\ell_{\infty}$ Minimization

215 Views Asked by At

The general problem I want to solve is well studied:

$$ \min_x \Vert Ax\Vert_\infty \;\;\; \mathrm{s.t.} \;\;\; Bx=c, $$

which is equivalent to the following linear program:

$$ \min_{t,x} \, t \;\;\; \mathrm{s.t.} \;\;\; -t \leq Ax \leq t \, \wedge \, Bx=c $$

There are several available blackbox solvers for the above. My problem is that $A$ is huge (viz. too big for memory) and is completely impractical to specify as a matrix. However, I can efficiently compute $Ax$ with a function, but the solvers do not allow for function handles. I had a similar problem in the past minimizing the $\ell_2$ norm of a large-scale linear system but was able to solve it following the suggestion of using a projected gradient method. This time the large-scale linear system is in the constraints, which means that I would again need to pass $A$ to the solver used in the projection of the gradient onto the feasible region. Again, due to the size of $A$, this is not doable.

Any ideas on algorithms or software to solve the above would be greatly appreciated.

1

There are 1 best solutions below

4
On BEST ANSWER

There are many methods. Here I will suggest one - formulating it as a sum of two non-smooth functions with (relatively) easily computable proximal operators. Then, you can use any method for optimizing a sum of two non-smooth functions, such as Douglas-Rachford.

You can re-formulate it as: $$ \min_{x,y} ||y||_{\infty} \quad \mathrm{s. t.} ~ Bx = c, y = Ax $$

As a reminder, the indicator of a set $C$, denoted by $\delta_C(x)$ is defined as: $$ \delta_C(x) = \begin{cases} 0 & x \in C \\ \infty & x \notin C \end{cases} $$

Now we re-write the optimization problem using indicator functions: $$ \min_{x, y} \underbrace{||y||_{\infty}}_{f_1} + \underbrace{\delta_{\{x:Bx = c\}}(x) + \delta_{\{(x,y):y = Ax\}}(x,y)}_{f_2} $$

Methods that minimize such functions will require computing the proximal operators of $f_1$ and $f_2$.

  • Since $f_1$ the infinity norm, its proximal operator can be computed by projecting onto the $\ell_1$-norm ball.
  • The proximal operator of $f_2$ is composed of projecting onto the set: $$ \left \{ (x, y): \begin{bmatrix} A&I\\B&0 \end{bmatrix} \begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}0\\c\end{bmatrix} \right \} $$ This projection cannot be computed accurately given the large-scale of the problem. However, the Conjugate-Gradient method can be used to compute it up to a certain accuracy.