Does it make sense to optimise a problem with linear objective and nonlinear constraints with Sequential Quadratic Programming?

900 Views Asked by At

The idea of Sequential Quadratic Programming is to transform the nonlinear problem into many quadratic problems by:

1) Replacing the objective function with its quadratic approximation 2) Replacing non linear constraints with their linear approximation.

The idea looks interesting and I wish to apply it for solving a problem which has

  • Linear Objective Function
  • Nonlinear constraints

Step 1 is not needed in this case since the objective is already linear.

Is Sequential Quadratic Programming still a good algorithm for this kind of problems? Is it likely to be outperformed by other algorithms which are specific for nonlinear optimisation with linear objective function?

1

There are 1 best solutions below

8
On BEST ANSWER

You've made an error in your assumptions. SQP doesn't replace the objective function with a quadratic approximation. Rather, it replaces the the objective function with a quadratic approximation to the Lagrangian. Since the Lagrangian has pieces of the constraints, the algorithm still pays off.

In order to see this, it's easiest to just derive things from Newton's method. Given a problem $$ \min\limits_{x\in X}\{f(x) : g(x)=0\} $$ the first order necessary conditions are $$\begin{array}{rl} \nabla f(x) + g^\prime(x)^*y &= 0\\ g(x)&=0 \end{array}$$ Here, $^*$ means adjoint, which is just the transpose when working in $\ell^2$. This gives a Newton's method system of $$ \begin{bmatrix} \nabla^2 f(x) + (g^{\prime\prime}(x)\cdot)^*y & g^\prime(x)^*\\ g^\prime(x) & 0 \end{bmatrix} \begin{bmatrix} \delta x\\ \delta y\\ \end{bmatrix} = -\begin{bmatrix} \nabla f(x) + g^\prime(x)^*y\\ g(x) \end{bmatrix} $$ Here, $(g^{\prime\prime}(x)\cdot)^*y=\sum\limits_{i=1}^m y_i\nabla^2 g_i(x)$ when working in $\mathbb{R}^m$ with the $\ell^2$ inner product. Anyway, it turns out that this is just the first order optimality conditions for the problem $$\begin{array}{rcl} \min\limits_{\delta x\in X}&&\frac{1}{2}\langle (\nabla^2 f(x) + (g^{\prime\prime}(x)\cdot)^*y)\delta x,\delta x\rangle + \langle \nabla f(x) + g^\prime(x)^*y, \delta x\rangle + (f(x) + \langle g(x),y\rangle\\ st&& g^\prime(x)\delta x + g(x) = 0 \end{array}$$ Here, $\langle x,y\rangle$ denotes the inner product, which, again, is just $x^Ty$ if we're in $\ell^2$. Anyway, note that the objective here is not a quadratic approximation of the objective. It's a quadratic approximation of the Lagrangian. This means it has bits of $g$ in it. Therefore, there's still a reason to the SQP methodology even though $f$ is linear.

Now, as to whether or not some other algorithm is faster, it's hard to tell and depends on the structure of the problem. SQP is a hammer and tends to work very well. Likely, it's the best place to start.


Edit 1

For a derivation and explanation of SQP algorithms, see Nocedal and Wright's Numerical Optimization, chapter 18. Here, you'll find the two forms of visualizing the SQP both from the Newton Method system point of view as well as the quadratic approximation.

As a final note, in that reference, you'll find the form of the quadratic approximation that the NEOS page uses, where there's no gradient of the Lagrangian, but just the gradient of the objective. The way we can obtain this is to take the Newton system:

$$ \begin{bmatrix} \nabla^2 f(x) + (g^{\prime\prime}(x)\cdot)^*y & g^\prime(x)^*\\ g^\prime(x) & 0 \end{bmatrix} \begin{bmatrix} \delta x\\ \delta y\\ \end{bmatrix} = -\begin{bmatrix} \nabla f(x) + g^\prime(x)^*y\\ g(x) \end{bmatrix} $$

Then, notice that $y_+ = y + \delta y$ since we're solving for a new Lagrange multiplier step. If we instead solve for that step directly and note, $\delta y = y_+ - y$ and plug that in, we get

$$ \begin{bmatrix} \nabla^2 f(x) + (g^{\prime\prime}(x)\cdot)^*y & g^\prime(x)^*\\ g^\prime(x) & 0 \end{bmatrix} \begin{bmatrix} \delta x\\ y_+ - y\\ \end{bmatrix} = -\begin{bmatrix} \nabla f(x) + g^\prime(x)^*y\\ g(x) \end{bmatrix} $$

Next, if we multiply this out, we get

$$ \begin{bmatrix} \nabla^2 f(x) + (g^{\prime\prime}(x)\cdot)^*y & g^\prime(x)^*\\ g^\prime(x) & 0 \end{bmatrix} \begin{bmatrix} \delta x\\ y_+\\ \end{bmatrix} = -\begin{bmatrix} \nabla f(x)\\ g(x) \end{bmatrix} $$

which corresponds to the quadratic approximation

$$\begin{array}{rcl} \min\limits_{\delta x\in X}&&\frac{1}{2}\langle (\nabla^2 f(x) + (g^{\prime\prime}(x)\cdot)^*y)\delta x,\delta x\rangle + \langle \nabla f(x), \delta x\rangle + (f(x) + \langle g(x),y\rangle\\ st&& g^\prime(x)\delta x + g(x) = 0 \end{array}$$

This is what the page you linked to uses. I was in error when I said it was incorrect. It is correct. So is the one that I posted above originally. The difference between the two is how we interpret the Lagrange multiplier for the quadratic subproblem. In one, we view the Lagrange multiplier as the step. In the other, we view it as the new Lagrange multiplier directly.

I'll still contend that, most of the time, we're better off just using the Newton's method form directly, which is the linear system above. There, we have all of the information directly and don't need to remember any implicit information such as what the Lagrange multiplier for the quadratic approximation really means. Really, most of the game in these algorithms is how we attack that system and the implication of it. Remember, Newton's method is not guaranteed to converge to a solution unless we'll sufficiently close to the optimum, so the algorithm needs to be globalized with either a line-search or trust-region method. Line-search methods generally need a positive definite Hessian approximation, so they'll replace the Hessian with a quasi-Newton approximation or use a Hessian modification. Trust-region methods needs something that looks like a quadratic model, but is positive definite near optimality, so they'll actually break this up into two steps and use a composite step SQP method in order to meet this requirement. All of that game is generally worked from the linear system point of view and not from the quadratic approximation point of view. It's sort of cute to view an algorithm for optimization as a sequence of optimization problems since we're closing the loop. At the end of the day, the only thing we can really solve are linear systems, so most of the game is generating a linear system that's easy to work with. Quadratic problems have linear systems as their optimality conditions, which is how we end up here.