Eliminate the Multiplier of Lagrange

519 Views Asked by At

I am reviewing the method of Lagrange multiplier and this time it strikes me as to why don't we just eliminate the multiplier $\lambda$ once and for all and just work with the remaining equations - since we are (mostly) only interested in locating the points at which extrema occur. I believe that, for most purposes, it is safe to assume that $\lambda$ does not vanish (see this for example), or even if we want to be safe, we only need to check that particular occurrence. So, instead of solving for $\lambda$ and plugging it back into the equations to compute the $x, y, z$, we could have eliminated the need to go through $\lambda$ to get $x, y, z$.

So, for instance, for a two-variable situation, we might want to recast the equations as

$\frac{f_x(x, y)}{f_y(x, y)}=\frac{g_x(x, y)}{g_y(x, y)}$.

I figure that there might be some difficulties with this approach since many sample solutions I see involve the computation of $\lambda$, but what are they?

3

There are 3 best solutions below

3
On

Lagrange Multipliers say that, given that $f$ is a function you are trying to minimize/maximize, and $g$ is a constraint, you can use the clever $\nabla g=\lambda\nabla f$ to create a system of equations for $x$, $y$, $z$, and $\lambda$. With as many equations as variables, one can solve for $x$, $y$, and $z$, with the quickest ways to do so sometimes requiring solving for $\lambda$ first.

If I am understanding your question correctly, you are asking why we cannot just set $\nabla g=\nabla f$ (functionally having $\lambda=1$) and thus ignore the $\lambda$. The issue is that it misunderstands the motivation for the method of Lagrange Multipliers; the method does not work without the nonzero $\lambda$ scaling factor.

The method guarantees that $f$ is maximized/minimized where $\nabla f$ is parallel to $\nabla g$, since the smallest/largest level curve/surface of $f$ should just barely touch $g$, and so their gradients should be parallel. It does not guarantee that these gradients should be equal, and so we cannot assume the gradients have equal magnitude or direction. The $\lambda$ is necessary in that sense.

3
On

The idea behind finding extrema of $f(v)$ with constraints $g(v)=0$ is that you are excluding points in which $\nabla f$ and $\nabla g$ is pointing in different directions. The remaining points are: points in which $\nabla f$ or $\nabla g$ doesn't exist, points in which they both exist and at least 1 is $0$, and points in which they both exist and is nonzero and point in the same direction (note: same direction include what's geometrically opposite direction, so we're talking about non-oriented direction). These should be exceptional points, few enough that you can check by hand.

For normal calculus problem, $\nabla f$ and $\nabla g$ always exist and $\nabla g$ is always nonzero on the constraint, so you're just looking for points in which $\nabla f=\lambda\nabla g$ for some $\lambda$.

Now of course, the method is also phrased as $\nabla g=\lambda\nabla f$. This is bad because it misses out the case where $\nabla f=0$. If it is phrased this way, however, you can assume $\lambda!=0$, not because of any mathematical reasons, but because calculus exercise won't give you an example where $\nabla g=0$ on the constraint.

But it is possible to just do something else instead. For example, if these are function on a plane, then each gradient is a 2-dimensional vector. In this case, to check if one is the multiple of another, you could compute the determinant between them to see if you get $0$. Similarly, in 3-dimension you can perform the cross product.

Just for completeness sake, here is an example with $\nabla g=0$.

Find the minimum of $f(x,y)=x^{2}+y^{2}$ with constraint $y^{2}+2y+1=0$

2
On

In principle, you can always eliminate Lagrange multipliers. For example, if you want to find extrema of $f(x,y)$ under the constraint $g(x,y)=0$, solving two equations $f_x g_y-f_y g_x=0$ and $g=0$ will give you the same candidates as those given by the Lagrange multiplier method (in addition, the case of $\nabla g=0$, often forgotten by students when using the Lagrange multiplier method, is naturally included in this way). The reason is that $\nabla f$ and $\nabla g$ being linearly dependent is equivalent to $f_x g_y-f_y g_x=0$; it is also equivalent to the existence of $\lambda$ such that $\nabla f=\lambda \nabla g$ (or $\nabla g=0$).

There are many undergraduate problems for which this approach produces solutions with shorter calculations. Problems with linear constraints would be exceptions: for example, the maximization of $\sum_{j=1}^{n}x_j \log_2 x_j$ under the constraint $\sum_{j=1}^{n}x_j=1$ (and $x_j \geq 0$) is a little bit easier to solve using the Lagrange multiplier method.

Now, if we could in principle avoid the use of Lagrange multipliers, then why should we introduce them? One reason would be their importance in numerical analysis. I'm not a specialist in numerical analysis, but I can name two algorithms that use the idea of Lagrange multipliers: Augmented Lagrangian Method (https://en.wikipedia.org/wiki/Augmented_Lagrangian_method) and Primal-Dual Interior-Point Method (https://en.wikipedia.org/wiki/Interior-point_method). Of course, there are algorithms that avoid the use of Lagrange multipliers (simple Newton methods for example), and their comparison should be made on a case-by-case basis.

Edit: Another motivation might be its application to the calculus of variations. For example, consider the problem of finding a curve $y=y(x)\geq 0$ with $y(-1)=y(1)=0$ and a given arc length $L=\int_{-1}^{1}\sqrt{1+(dy/dx)^2}\, dx$ that maximizes the area $A=\int_{-1}^{1}y(x)\, dx$ between $y=0$ and $y=y(x)$ (Dido's problem). Applying the Lagrange multiplier method, we get an ODE $$ y-\lambda \sqrt{1+(dy/dx)^2}+\frac{\lambda(dy/dx)^2}{\sqrt{1+(dy/dx)^2}}=k, $$ where $\lambda$ is the Lagrange multiplier and $k$ is an integral constant (derivation: write down the Beltrami identity for the Lagrangian $L(y,dy/dx)=y-\lambda \sqrt{1+(dy/dx)^2}$). Solving the ODE with the boundary conditions $y(-1)=y(1)=0$ gives the curve $y(x)$ representing an arc of a circle, that is, $\{ x^2+(y-k)^2=\lambda^2,\, y\geq 0 \}$ with the relation $1+(y-k)^2=\lambda^2$. The arc length condition $L=\int_{-1}^{1}\sqrt{1+(dy/dx)^2}\, dx$ finally determines $\lambda$ and $k$. In this problem, it seems difficult for me to arrive at the conclusion by eliminating the Lagrange multiplier $\lambda$ before solving the ODE. In this sense, the introduction of the Lagrange multiplier seems essential.

Overall, to convince someone the importance of introducing the Lagrange multiplier, we could either argue its importance in (i) numerical analysis or in (ii) the calculus of variations. I guess (ii) is the main reason why the Lagrange multiplier method appears frequently in traditional textbooks (although the use of the Lagrange multiplier does not seem to be essential, even redundant in some cases, for solving the problems in these books) since the tradition seems to have started before the widespread use of computers. I know that both (i) and (ii) may be difficult for undergraduate students, but I could not find undergraduate level problems that one can solve by hand that explains the indispensability of "$\lambda$".