I was revising my notes from an optimization class I did a few years ago. After reading a bit, I started questioning myself about the benefit of using Simplex over other optimization methods.
The Simplex method has an exponential or polynomial complexity in the worst case. This makes it be quite slow to solve real problems in a normal computer (as I have verified myself). Also, this method can only be applied to linear problems, which is a small subset of all the possible problems.
In the other hand, other methods like Lagrange/KKT multiplier method seems much less restrictive. At the same time, when I checked the steps to get the optimal point using this methods, they seem quite simple:
- Calculate the Lagrangian $\mathcal{L}(x_{1},..,x_{n},\lambda_{1},..,\lambda_{l},\mu_{1},..,\mu_{k})$
- Solve $\vec{\nabla \mathcal{L}} = \vec{0}$
- Check conditions
Now my question is, why would you use Simplex over KKT? The later seems faster and less restrictive. Is there any case where Simplex can be used and KKT not?
You could argue that it can be difficult to solve the equation system in step 2, but, is it for a computer? Considering a linear problem, it can be solved using matrices. Or, is any of the steps in KKT more difficult than solving a Simplex problem that may require millions of iterations to finish?
I have the feeling I am missing something or misunderstanding these methods but I can't find what's wrong in my reasoning.
Let me restrict myself to linear problems, because for nonlinear problems you cannot choose the simplex method.
Although the simplex method has exponential complexity in the worst case, in practice it is still quite fast on most real world problems. That is mostly due to the spectacular improvements made by CPLEX and Gurobi in their simplex implementation. The paper A Brief History of Linear and Mixed-Integer Programming Computation by Robert Bixby (the last two letters in Gurobi) provides a good overview of the historic developments.
Solving $\vec{\nabla \mathcal{L}} = \vec{0}$ only works if you have equality constraints and if the variables are not restricted in sign. To solve a linear optimization problem in standard form: $$\min\{c^Tx : Ax \geq b, x\geq 0\},$$ what you need to solve are the KKT conditions: $$Ax \geq b$$ $$A^T y \leq c$$ $$y^T(Ax-b)=0$$ $$x\geq 0, y\geq 0.$$ In other words, you need to find a pair $(x,y)$ where $x$ is feasible to the primal problem, $y$ is feasible for the dual problem, and the complementary slackness (CS) condition $y^T(Ax-b)=0$ is satisfied. That is by no means easy! What interior point methods do is replace the CS condition with $y^T(Ax-b)=\mu$ and solve the problem iteratively while gradually letting $\mu \downarrow 0$. Whether that has an advantage compared to the simplex method depends on the structure of $A$. Interior point methods are fast when $AA^T$ is sparse, and the rows/columns can be permuted in a way that its Cholesky factor is also sparse.
In practice it is also important to detect infeasible or unbounded problems. To do that, interior point methods often solve the homogeneous model. The paper The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm. by Andersen & Andersen shows how solving the homogenous model reveals ill posed problems.
The running time of an interior point method is very predictable, but that of the simplex method is not. Therefore, if you run CPLEX or Gurobi with their default settings, they actually run the simplex method and the interior point method in parallel, and terminate as soon as one method finds the optimal solution.