I was wondering what is the exact meaning of Sequential Quadratic Programing (SQP) and I have found many results:
Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization. SQP methods are used on mathematical problems for which the objective function and the constraints are twice continuously differentiable.
Sequential quadratic programming (SQP) is one of the most effective methods for nonlinearly constrained optimization problems. The method generates steps by solving quadratic subproblems; it can be used both in line search and trust-region frameworks.
Sequential quadratic programming (SQP) is a class of algorithms for solving non-linear optimization problems (NLP) in the real world.
etc...........
None of these definitions make sense to me. Because, every method I have seen so far looks like SQP to me. They do not properly exclude the alternatives.
So,
- what is not SQP?
- What is the SQP definition exactly?
- What are the alternatives to SQP?
- Is SQP contradictory to Interior Point (IP) method?
Sequential Quadratic Programming (SQP) is a method to solve constrained nonlinear optimization problems. The method seeks an optimal solution by iteratively (sequentially) solving Quadratic Programming (QP) subproblems.
In the sequence of iterations, each iteration consists of:
The algorithm may terminate when a certain tolerance is reached:
$$|J(\mathbf{x}_{n+1})-J(\mathbf{x}_{n})| < \text{tolerance}$$
where $J(\mathbf{x})$ is the cost function that we want to minimize, $\mathbf{x}$ is the optimal solution of the problem, and subscripts $n$ an $n-1$ denote two subsequent iterations.
Why don't we just solve the main nonlinear problem the same way we solve QP subproblems in each iteration?
An NLP problem is formulated as:
$$ \begin{align} \text{min } J(\mathbf{x}) \\ \text{subject to } \mathbf{g}(\mathbf{x}) \leq 0 \\ \text{subject to } \mathbf{h}(\mathbf{x}) = 0 \end{align} $$
And a problem with quadratic form is specifically formulated as: $$ \begin{align} \text{min } J(\mathbf{x}) = \dfrac{1}{2}\mathbf{x}^T\mathbf{Q}\mathbf{x} + \mathbf{c}^T\mathbf{x}\\ \text{subject to } A\mathbf{x} - b \leq 0 \\ \text{subject to } E\mathbf{x} - d = 0 \end{align} $$
where the second and third equations are the inequality and equality constraints of each problem, respectively.
The usual way to solve either of them is to find the value of $\mathbf{x}$ that satisfies that the first derivative of the objective function is 0, and the second derivative is positive semi-definite.
$$ J_x(\mathbf{x}) = \mathbf{0} \\ J_{xx}(\mathbf{x}) \geq \mathbf{0} $$
So why do we need to treat the main problem as an iteration of QP subproblems? Why don't we use the same techniques we use with QP to solve the main problem?
Because once the problem is expressed in quadratic form, it is relatively easy to employ (matrix) algebra and solve the system of equations that results from the derivatives above. Both the main problem and the QP subproblems will have a nonlinear objective function, but the QP will only have square ($x_1^2, x_2^2, ...$) and/or bilinear ($x_1x_2, ...$) terms.
The presence of other types of nonlinearities prevents us from using simple algebra to obtain the solution to the derivatives above.
The transformation of the main problem into a QP problem is done through linearization. We can use the Taylor expansion to transform any nonlinear $J(\mathbf{x})$ into a second-order quadratic expression:
$$ J(\mathbf{x}) \approx J(\mathbf{\bar{x}}) + J_x^T(\mathbf{\bar{x}})(\mathbf{x}-\mathbf{\bar{x}}) + \dfrac{1}{2}(\mathbf{x}-\mathbf{\bar{x}})^TJ_{xx}(\mathbf{\bar{x}})(\mathbf{x}-\mathbf{\bar{x}}) $$
If we get rid of the constant term at the beginning (which will vanish in the derivatives anyway), and define $$ \mathbf{p} = \mathbf{x}-\mathbf{\bar{x}} $$
Then we obtain the quadratic form mentioned earlier:
$$ J(\mathbf{x}) \approx J_x^T(\mathbf{\bar{x}})\mathbf{p} + \dfrac{1}{2}\mathbf{p}^TJ_{xx}(\mathbf{\bar{x}})\mathbf{p} $$
Therefore, when we solve the QP subproblems, we are obtaining $\mathbf{p}$, which we then use to update $\mathbf{x}$ via:
$$ \mathbf{x}_n = \mathbf{x}_{n-1} + \mathbf{p} $$
In order words, $\mathbf{p}$ is the search direction towards where the optimization is moving in order to reach an optimal value.
I omitted the transformation of the constraints to keep it shorter; the equations of the constraints are also linearized around the same point as the cost function.
Regarding the other questions at the bottom:
What is not SQP? SQP is just a method. Any method that does not reduce an optimization problem into a QP problem iteratively, is not an SQP method.
What are the alternatives to SQP? I believe interior-point method, trust region, SLP are among the alternative methods.
Is SQP contradictory to Interior Point (IP) method? They are two methods used to solve NLP. SQP advances along the constraints boundary of the feasible region, while IP advances through the interior (it avoids the boundary until right before optimal solution is found; it does so by appending a barrier function to the objective function). IPM requires less iterations, but each iteration involves a larger system of equations. What they have in common is that they iteratively get closer to the optimal solution, and update the optimal solution at each iteration through a Newton method.
(From the comments) What is the difference between SQP and successive linearization? SQP relies on linearization at the beginning of each iteration. By definition, success linearization is a tool used in SQP. That being said, "successive linearization" is not the name of any specific optimization method; it just means that whatever iterations happen in a given algorithm, there is a linearization process at some point in each iteration.