Why are constraints distinct from cost functions in nonlinear programming?

344 Views Asked by At

In virtually all handling of optimization, constraints are distinct from cost functions. In optimisation problems like linear programming, this is a necessity, as the linear cost functions cannot express these bounds. But in nonlinear programming, the constraints could be seen as a multiplier on the cost function to guarantee all points that fail the constraints are worse than points meeting them. However, I don't see this approach in the literature I have read. Why is it pragmatically useful to have constraints as a distinct thing from the cost function?

1

There are 1 best solutions below

0
On

Why we usually don't do this

If we consider nonlinear programming in full generality - an arbitrary cost function, with arbitrary constraints - then the problem is too general to solve. No value of the objective function at any point can tell you anything about any other point.

So we usually limit nonlinear programming to specific classes of functions. Often we want our functions to be differentiable, for example, so that we can do things with gradients. See also: convex programming.

Turning constraints into terms in the cost function can often mean that the cost function isn't as nice as we wanted it to be - depending on how you do it. And then our methods don't apply.

Actually, we do it sometimes

Techniques where we make constraints be part of the cost function are called penalty methods in optimization.

Suppose that you want to minimize $f(x)$ subject to $g(x) \le 0$. How do you do it?

Here's one approach. Define $g^+(x) = \max\{0, g(x)\}$. Then, minimize $f(x) + C \cdot g^+(x)$ for some large $C$. The good news is that for $C$ sufficiently large, this is equivalent to the original constrained problem (under some reasonable assumptions).

The bad news is that (as alluded to earlier) the objective function $f(x) + C \cdot g^+(x)$ isn't very nice: it's not differentiable when $g(x) = 0$. This means that, for example, we can't minimize it by finding critical points: we'd have to separately check the set of all point where $g(x)=0$, which is the entire boundary of our original feasible region. This defeats the whole point of getting rid of constraints.

Here's another approach: minimize $f(x) + C \cdot [g^+(x)]^2$ for some large $C$. This is a much nicer function to deal with. Being differentiable doesn't just help us look for critical points; it's good for numerical methods as well.

The bad news is that there's often no value of $C$ we can take for which minimizing this new function will actually give us the answer to the original problem. The best we can hope for is that as $C \to \infty$, the unconstrained optimum will approach the correct solution to the original problem. In practice:

  • It takes very mild assumptions to know that if we converge to something as $C \to \infty$, it will be to the correct solution.
  • Some fairly strong assumptions on top of that are required to know that convergence will actually occur. For example, we might assume that the (original) objective function is coercive, which is asking for a lot.