Lagrangian function (where does it come from?)

634 Views Asked by At

Where does the Lagrangian function come from? I know about gradient vectors and $\lambda$: $$ \nabla f = \lambda \nabla g $$

But from there what steps were done to get the Lagrangian function? (This one): $$ L\left(x,\ y,\ \lambda\right)=f\left(x,y\right)-\lambda\left(g\left(x,y\right)-c\right) $$

Why can the constant $c$ be included here?

I am a high school student trying to explain it, and it most probably goes way over my head, so I am mostly just looking for a reference proof I can point towards in my bibliography. I can't find one; thank you for any help in that regards.

2

There are 2 best solutions below

2
On BEST ANSWER

I’ll try to stick with your notation. You are minimizing $f(x,y)$ subject to the constraint that $g(x,y) = c$.

You can learn about Lagrange multipliers without mentioning the Lagrangian. Once we write down the optimality conditions $$ \nabla f(x,y) - \lambda \nabla g(x,y) = 0 \quad \text{and} \quad g(x, y) - c = 0 $$ someone clever might notice that if we introduce the function $$ L(x,y, \lambda) = f(x,y) - \lambda (g(x,y) -c) $$ then these equations can be written as $$ \frac{\partial L}{\partial x} = 0, \quad\frac{\partial L}{\partial y}=0, \quad \frac{\partial L}{\partial \lambda}= 0 $$ which looks kind of neat.

Here is another way you might think of introducing the Lagrangian. One might try to enforce the constraint $g(x,y) = c$ by including a penalty term $-\lambda(g(x,y) - c)$ in the objective function. With this strategy, our new optimization problem is to minimize $$ \tag{1} f(x,y) - \lambda (g(x,y) - c) $$ with no constraints on $x$ and $y$. If we solve this problem and find that $g(x,y) < c$, then let’s increase the value of $\lambda$ a bit (which increases the penalty for having $g(x,y) < c$) and try again. We can hope to find a perfect value of $\lambda$ such that a point $(x,y)$ which minimizes (1) also satisfies $g(x,y) = c$.

6
On

Here's an explanation to build intuition about Lagrange multipliers. Suppose you need to optimize (maximize or minimize) $f$ under the constraint that $g(x)=0$. Note that if your constraint is of the form $g(x)=c$ where $c$ is a constant, then you can replace $g$ with $g-c$ to and the rest below applies as well.

The Lagrange formalism says that you should form the Lagrangian $$L\left(x,\ \lambda\right)=f\left(x\right)-\lambda g\left(x\right)\tag{1} $$ and optimize it with respect to both $x$ and $\lambda$. It effectively turns a problem of optimization under constraints (optmize $f$ under the $g$ constraint) into a problem of optimization without constraint (optimize $L$).

Solving $(1)$ means that you need to solve $$\left \{ \begin{array}{ll} \nabla f(x) &= \lambda \nabla g(x) \tag{2}\\ g(x)&=0 \end{array} \right.$$ But where is that coming from?

The key point to understand is that moving (by an infinitesimal step) in a direction orthogonal to the gradient of a function achieves no variation of that function. That's because the gradient of a function at a point is orthogonal to the level set of that function that goes through that point.

That was a mouthful. Hopefully a picture can help. Suppose that the function $f$ looks like this view from above (basically, each ellipse-like curve is a level set of $f$): Level sets of f

  • Suppose you are at the point $x$ where the red and green arrows meet. The red arrow represents $\nabla f(x)$ while the green arrow represents $\nabla g(x)$. Then you can see that by moving in a direction orthogonal to the green arrow, you can reach another level set, one with a higher or lower value of $f$ while preserving the constraint $g=0$. So as long as you start from a point $x$ where $\nabla f(x)$ and $\nabla g(x)$ are not collinear, you can always move to another point where the constraint is satisfied and you increase or decrease function $f$.
  • Contrast this with the situation when you're at a point $x$ where both $\nabla f(x)$ (pink arrow) and $\nabla g(x)$ (blue arrow) are collinear. Now, if you move in a direction orthogonal to $\nabla g(x)$ (to satisfy the constraint), you'll also move in a direction that's orthogonal to $\nabla f(x)$. But then remember that $\nabla f(x)$ is orthogonal to the level set of $f$ that goes through $x$, so moving orthogonally to $\nabla f(x)$ achieves no variation of the function $f$ at all. In that case, you're already at an extremum under the constraint.

This proves that extrema of a function under a constraint are achieved when both gradients of the function and constraint are collinear. This explains why you need $(2)$.