Large Scale Quadratic Form with Linear Box Constraints

747 Views Asked by At

I have the following quadratic program

$$\min_x x^TAx \qquad \text{s.t} \quad Ax \in [a,b]^m$$

where matrix $A$ is positive semidefinite, and is similar both the objective function and in the constraint. I would like to solve this problem for large-scale input matrix $A$ and am looking for a swift solution.

Is there any specific recommendation from different optimization methods such as projection gradient, proximal gradient, trust region active set, so forth? Any comparison between these methods (and others) are appreciated.

2

There are 2 best solutions below

25
On

One option which I think is promising is to formulate your problem as minimizing $f(x) + g(x) + h(Ax)$, where $f(x) = x^T A x$ and $g(x) = 0$ and $h(y) = I_{[a,b]}(y)$. (That is, $h$ is the convex indicator function of the set denoted here as $[a,b]$.) The function $f$ is differentiable and $g$ and $h$ have easy proximal operators, so you can solve this optimization problem using the PD3O method (which is a recent three-operator splitting method).

In this approach, we never have to solve a linear system involving $A$, which is nice because $A$ is large.

0
On

If you can afford a single calculation of the Eigen Decomposition of $ \boldsymbol{A} $ I'd go with ADMM.
Once you have the eigen decomposition you can solve the iterative problem by matrix multiplication only.

If you can't, you have 2 options:

  1. Solve the Linear Equation with iterative solver (Conjugate Gradient). Probably even few iteration will be enough. Starting from the last solution (Warm up start) will probably make things pretty fast after few iterations.
  2. Use Primal Dual methods to apply the indicator function on an auxiliary vector variable (Dual) instead of the matrix. You may use the method in @littleO answer (PD3O - A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator). There are others as well. Since you have 2 functions in the form $ \boldsymbol{A} \boldsymbol{x} $ then a 3 operators method is required.

ADMM Solution

Assuming one can apply some factorization (Eigen / SVD) on $ \boldsymbol{A} $ then this becomes the best solution.

The model:

$$\begin{align*} \arg \min_{ \boldsymbol{x}, \boldsymbol{z} } \quad & \underbrace{\frac{1}{2} \boldsymbol{x}^{T} \boldsymbol{A} \boldsymbol{x} }_{ f \left( \boldsymbol{x} \right) } + \underbrace{ {\pi}_{\mathcal{C}} \left( \boldsymbol{A} \boldsymbol{x} \right) }_{ g \left( \boldsymbol{z} \right) } \\ \text{subject to} \quad & \begin{aligned} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{z} & = 0 \\ \end{aligned} \end{align*}$$

Where ${\pi}_{\mathcal{C}} \left( \boldsymbol{x} \right)$ is the indicator function over the set $\mathcal{C} = {\left[ a, b \right]}^{m}$.

We have to define 2 steps, one per function:

  • $ \arg \min_{\boldsymbol{x}} \frac{1}{2} \boldsymbol{x}^{T} \boldsymbol{A} \boldsymbol{x} + \frac{\lambda}{2} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right\|}_{2}^{2} $. Given the eigen decomposition of the operator $ \boldsymbol{A} = \boldsymbol{F} \boldsymbol{D} \boldsymbol{F}^{T} $ then $ \hat{\boldsymbol{x}} = \boldsymbol{F} \hat{\boldsymbol{D}} \boldsymbol{F}^{T} \boldsymbol{b} $ where $ \hat{{D}}_{ii} = \frac{ \lambda {D}_{ii} }{ \lambda {D}_{ii}^{2} + {D}_{ii} } = \frac{ \lambda }{ \lambda {D}_{ii} + 1 } $. This formulization allows updating $\lambda$ on the fly (Which can improve convergence).
  • $ \operatorname{Prox}_{\lambda {\pi}_{\mathcal{C}} \left( \cdot \right)} \left( \boldsymbol{y} \right) = \operatorname{Proj}_{\mathcal{C}} \left( \boldsymbol{y} \right) = \max \left( \min \left( \boldsymbol{y}, b \right), a \right) $.

Using those atoms all needed is to keep iterating.
In case $\boldsymbol{A}$ is too large for decomposition (Sparse). one could use iterative solver to solve $ \left( \lambda \boldsymbol{A}^{T} \boldsymbol{A} + \boldsymbol{A} \right) \boldsymbol{x} = \lambda \boldsymbol{A}^{T} \boldsymbol{y} $.
It might be that even several iteration could be enough as the accuracy will get better with each ADMM iteration assuming using warm up technique.

PD3O Solution

The PD3O algorithm is relatively updated algorithm with guaranteed convergence for 3 operators split.
Its main advantage in this context is its ability to refrain from solving a linear system.

The model:

$$ \arg \min_{\boldsymbol{x}} \underbrace{\frac{1}{2} \boldsymbol{x}^{T} \boldsymbol{A} \boldsymbol{x} }_{ f \left( \boldsymbol{x} \right) } + \underbrace{0}_{g \left( \boldsymbol{x} \right)} + \underbrace{ {\pi}_{\mathcal{C}} \left( \boldsymbol{A} \boldsymbol{x} \right) }_{ h \left( \boldsymbol{A} \boldsymbol{x} \right) } $$

The building blocks needed are:

  • For $ f \left( \cdot \right) $ its gradient $ {\nabla}_{\boldsymbol{x}} f $ is needed.
  • For $ g \left( \cdot \right), \; h \left( \cdot \right) $ their proximal operator is needed.

Given that, the iterations are pretty simple.
Yet the algorithm is much slower than the ADMM above and is highly sensitive to the condition number of $ \boldsymbol{A} $.

Remark: Since this is a dual algorithm, it actually requires the proximal operator of the conjugate function of $ h \left( \cdot \right) $. Yet given the proximal of a function the proximal of its conjugate can be easily calculated as:

$$ \operatorname{Prox}_{ \lambda {h}^{\ast} \left( \cdot \right) } \left( \boldsymbol{y} \right) = \boldsymbol{y} - \lambda \operatorname{Prox}_{ \alpha h \left( \cdot \right) } \left( \alpha \boldsymbol{y} \right), \; \alpha = \frac{1}{\lambda} $$

Results

I implemented 3 algorithms for this problem (Chambolle Pock, which is a 2 operators Primal Dual method).

enter image description here

enter image description here

The y axis shows $ \frac{ \left| {f}^{\star} - {f}_{i} \right| }{ \left| {f}^{\star} \right| } $ in dB. Where $ {f}^{\star} $ is the optimal objective function value calculated by a Disciplined Convex Programming (DCP) solver.

The problem in both cases have the same dimensions, $ \boldsymbol{A} \in \mathbb{R}^{m \times n}, \; m = n = 500 $.
Yet the convergence speed is totally different from the primal dual methods while ADMM stays robust. The reason is the different condition number in the 2 cases.

Iteration wise, the work of the ADMM is much heavier, so the actual run time should measured.

The Julia code is available at my StackExchange Mathematics Q3042354 GitHub Repository (Look at the Mathematics\Q3042354 folder).

Remark: Since Chambolle Pock is 2 operators Primal Dual method, it still requires solving the linear system. Hence ADMM is still preferred.

Remark: If $0 \in \left[ a, b \right]$ then the problem is solved by $0 \cdot \boldsymbol{1}$, namely the zero vector, as $ \boldsymbol{A} \in \mathcal{S}_{+}$. So basically $a. b$ must have the same sign.