Lagrange multipliers for half-open constraints

220 Views Asked by At

The examples I see using Lagrange multipliers usually express a constraint as something like

$$x^2 + y^2 \ge C$$

but then solve the problem as if the constraint were

$$x^2 + y^2 = C$$

which works for many applications. But what if I'm not certain $L$ will be minimized on the constraint surface? Can Lagrange multipliers still be used?

The application is support vector machines where we minimize the 2-norm of a vector $\vec w$ subject to one constraint of the form

$$\vec w \cdot \vec x_i + b \ge 1$$

for many different vectors $\vec x_i$.

In this case, not all of the vectors will satisfy (and, indeed not all vectors can satisfy)

$$\vec w \cdot \vec x_i + b = 1$$

in the optimal solution.

If I create a Lagrangian like this:

$$L = \vec w \cdot \vec w - \lambda_i(1-\vec w \cdot \vec x_i + b)$$

then what's to stop $\lambda_i$ from going to $-\infty$ for those vectors for which

$$\vec w \cdot \vec x_i + b > 1 ?$$

Are Lagrange multipliers the right tool here?

3

There are 3 best solutions below

3
On

Define $L (x,y,\lambda)$ as $f(x,y)-\lambda g(x,y)$. Where $f(x,y)$ is the function to be maximized, and $g(x,y)$ being the constraint.

Then, $\text{max/min}\{f(x,y)\}$ subject to $g(x,y)\leq c$ happens at:

$$\frac {\partial L}{\partial x}=0,\frac {\partial L}{\partial y}=0, g(x,y)=c$$

But, in reality the last equation is just: $\lambda (g(x,y)-c)=0$

Furthermore, once you get those points, solve the equations and check which ones satisfy the the inequality $g(x,y)\leq c$. From there you can check the hessian matrix for the max and min.

You are on the right track, but the Lagrangian is the generalization you are looking for.

I hope that answered your question.

2
On

With that sort of constraint, we have two possibilities - either the maximum comes in the interior of the region, or it comes on the boundary.

If it comes in the interior, it's a "normal" optimization problem, in which we look for critical points where the gradient is zero. If it comes on the boundary, we look for critical points there with Lagrange multipliers. And since we don't know which it'll be in advance, we do both. An easy example to illustrate:

Example: Maximize $f(x,y)=xy$ on the unit disk $g(x,y)=x^2+y^2\le 1$

The gradient of $f$ is $\nabla f(x,y)=(y,x)$. This is zero at $(0,0)$, inside the disk. The value there is $0$. On the boundary $g(x,y)=1$, we look for places where $\nabla f$ is parallel to $\nabla g=(2x,2y)$, solving the system $y=\lambda\cdot 2x$, $x=\lambda\cdot 2y$. Substituting the second equation in the first, $y=\lambda^2\cdot 4y$, and $\lambda=\pm\frac12$. The choice $\lambda=\frac12$ leads to two points $(\frac1{\sqrt{2}},\frac1{\sqrt{2}})$ and $(-\frac1{\sqrt{2}},-\frac1{\sqrt{2}})$ with $y=x$, and the choice $\lambda=-\frac12$ leads to two points $(\frac1{\sqrt{2}},-\frac1{\sqrt{2}})$ and $(-\frac1{\sqrt{2}},\frac1{\sqrt{2}})$ with $y=-x$. The first pair give $f(x,y)=\frac12$, while the second pair give $f(x,y)=-\frac12$.

Out of all these critical points, the largest values are $\frac12$ at $(x,y)=(\frac1{\sqrt{2}},\frac1{\sqrt{2}})$ and $(x,y)=(-\frac1{\sqrt{2}},-\frac1{\sqrt{2}})$, so that's the maximum.

But of course, we had to go through the full process, checking both the interior and the boundary. It's just like the one-dimensional case where a maximum can occur either in the interior or at an endpoint of the interval. In fact, we might need to go through several layers; if the constraints are something like a polyhedron, we have to check the interior, the faces, the edges, and the vertices all separately.

0
On

For posterity, let me state more fully the answer Misha alluded to in comments. Lagrange multipliers, should, for exactness, specify that $$\lambda \ge 0.$$

For these half-open constraints, Karush-Kuhn-Tucker conditions are appropriate. They impose conditions of "complementary slackness." For all constraints of the form $$g_i(\vec x) \le 0$$ the KKT theorem dictates that $$\lambda_i g_i(\vec x) = 0.$$ These can be satisfied either by $$g_i(\vec x) = 0$$ or by $$\lambda_i = 0.$$

https://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions