Lagrange multipliers and critical points

2.5k Views Asked by At

I have (probably) a fundamental problem understanding something related critical points and Lagrange multipliers.

As we know, if a function assumes an extreme value in an interior point of some open set, then the gradient of the function is 0.

Now, when dealing with constraint optimization using Lagrange multipliers, we also find an extreme value of the function restricted to some curve.

So why in the case of constraint optimization can't we also search for points where the gradient is 0? What am I missing here?

Thank you.

4

There are 4 best solutions below

8
On

So if the gradient were zero on the constraint surface, that would be fine (and it would satisfy the Lagrange condition). But typically this doesn't happen; for instance, if you're extremizing $f(x,y)=x+y$, its gradient is never $0$, but this function still has extrema on the unit circle.

The key fact here is that if $S$ is the surface $\{ \mathbf{x} : g(\mathbf{x})=0 \}$, the tangent (hyper)plane of $S$ at each point is perpendicular to the gradient of $g$ at that point. This means that the Lagrange condition can be understood as "the gradient of $f$ is perpendicular to the tangent (hyper)plane of $S$". So the Lagrange condition tells you that if you move in a direction tangent to $S$, $f$ will not change to first order. If it did, then you could go a sufficiently short distance in one direction to increase $f$, and a sufficiently short distance in the opposite direction to decrease $f$. Finally, it turns out that what we've said about the tangent directions transfers to the surface itself.

Thus the Lagrange condition is necessary (under the "regularity" hypotheses of the Lagrange multiplier theorem, which include the constraint qualification--see comments). Just like in the unconstrained case, it is not sufficient.

2
On

Because the gradient along the constraint can be zero even though the gradient itself isn't. For instance, in $\Bbb R^3$, take the function $f(x,y,z)=z$, and constrain it to the unit sphere. The gradient of $f$ is non-zero everywhere.

However, imagine you lived on the sphere and had no idea that is part of a bigger space. In other words, like we do on earth if we forget that we can fly up or dig down. Then we would think that the gradient of the function $f$ was zero on the north and south poles, simply because in any direction we can conceive, $f$ if stationary at those two points. Those are the kind of points Lagrange multipliers let us find.

If course, if the true gradient happens to be zero on the constraint, then of course it's also zero along the constraint. However, that's only a small special case among all cases where the gradient along the surface is zero, and using the method of Lagrange multipliers we pick up those automatically along with all the others.

0
On

Searching only for vanishing gradients you typically miss the constraint extrema.

Suppose the constraints are given by $S=\{x: g_1(x)=c_1,...,g_k(x)=0\}$. Take any $C^1$ curve $x(t)$ satisfying the constraints and with $x(0)=a$. Then for all $i$: $$ 0 = \frac{d}{dt}_{|t=0} g_i(x(t))= \nabla g_i (a) \cdot x'(0)$$

Now, if $f(x)$ has an extremum at $a$ subject to these constraints then we must also have $$ 0 = \frac{d}{dt}_{|t=0} f(x(t))= \nabla f (a) \cdot x'(0)$$ This is clearly ok if $\nabla f(a)=\lambda_1 \nabla g_1(a) +\cdots + \lambda_k \nabla g_k(a)$, i.e. is a linear combination of the $\nabla g_i(a)$'s. It is now a theorem of linear algebra that this is the only possibility. The theorem is that if $W$ is a subspace (here the span of the $\nabla g_i(a)$'s) and $u \perp W^\perp$ then $u\in W$.

0
On

In this post, I am only going to work with 2 variables, but this can be generalized to $n$ variables.

Maximize $f(x,y)$ when constrained to $g(x,y)=0$.

We can express $g(x,y)$ as a parametric equation, say $(X(t), Y(t))$.

Say that $(x_0, y_0)$ is an extremum of $f(x,y)$ along the curve $g(x,y)=0$. We can find a value $t_0$ such that $X(t_0)=x_0$ and $Y(t_0)=y_0$. Then, the point $t_0$ must be an extremum for the function $f(X(t),Y(t))$. In other words: $$\frac{df(X(t),Y(t))}{dt}\bigg|_{t=t_0}=\frac{\partial f(x,y)}{\partial X(t)}\bigg|_{t=t_0}\cdot\frac{dX(t)}{dt}\bigg|_{t=t_0}+\frac{\partial f(x,y)}{\partial Y(t)}\bigg|_{t=t_0}\cdot\frac{dY(t)}{dt}\bigg|_{t=t_0}=0$$

This means that $\displaystyle\nabla f(X(t_0),Y(t_0))\cdot\left(\frac{dX(t)}{dt}\bigg|_{t=t_0},\frac{dY(t)}{dt}\bigg|_{t=t_0}\right)=0$.

In other words, the vector $\nabla f(X(t_0),Y(t_0))$ is perpendicular to $\displaystyle\left(\frac{dX(t)}{dt}\bigg|_{t=t_0},\frac{dY(t)}{dt}\bigg|_{t=t_0}\right)$.

To add on, we know that, while $\displaystyle\left(\frac{dX(t)}{dt}\bigg|_{t=t_0},\frac{dY(t)}{dt}\bigg|_{t=t_0}\right)$ is a tangent vector to the curve $g(x,y)=0$, the vector $\nabla g(X(t_0),Y(t_0))$ is a normal vector (since $g(x,y)=0$ is a contour). Ergo, we conclude that these vectors are also perpendicular.

This means that $\nabla f(X(t_0),Y(t_0))=\nabla f(x_0,y_0)$ and $\nabla g(X(t_0),Y(t_0))=\nabla g(x_0,y_0)$ must be parallel. In other words $\nabla f(x_0,y_0)=\lambda \nabla g(x_0,y_0)$.