Why do lagrange multipliers have the form $\nabla G$

404 Views Asked by At

I was studying some multivariable Calculus and we were covering the topic of Lagrange multipliers. I didn't understand exactly why the equations take the form:

$$ \nabla f = \lambda \nabla G $$

Where $G$ is the constraint curve and $f$ is the function being maximized. I understand that $\nabla f$ being the gradient, points in the direction that maximizes f the most and therefore the point on the constraint curve edge G where $\nabla f$ is perpendicular to G (since there is nowhere along the curve that we can travel to increase $f$ without actually moving off of the curve G itself.

My question is why must the gradient of G(x,y) the curve such that $G(x,y)= 0$ is our constraint curve G

ALWAYS be perpendicular to the constraint curve G.

I understood visually why this works with a few examples such as a little hill or mountain etc...

But can somebody give a clear and explicit proof why:

$$ \begin{pmatrix} \frac{\partial G}{\partial x} \\ \frac{\partial G}{\partial y} \end{pmatrix} \perp \begin{pmatrix} 1 \\ \frac{dy}{dx} \end{pmatrix} $$

Given that $G(x,y) = 0$

3

There are 3 best solutions below

0
On

Heuristic answer (the sort you would see in, e.g., a thermodynamics class):

Let $G : \mathbb{R}^2 \to \mathbb{R}$. Consider the surface $G(x,y)=0$. Then to move from a point $(x,y)$ to an infinitely close point $(x+dx,x+dy)$ on the surface, we must have

$$dG=\frac{\partial G}{\partial x} dx + \frac{\partial G}{\partial y} dy = 0.$$

So we get an implicit function $y : \mathbb{R} \to \mathbb{R}$ such that $\frac{dy}{dx} = -\frac{\frac{\partial G}{\partial x}}{\frac{\partial G}{\partial y}}$. Essentially the same thing happens if $G : \mathbb{R}^{n+m} \to \mathbb{R}^m$ instead.

If you don't like explicit differentials, this idea can avoid them by introducing an auxiliary variable $t$ which parametrizes the surface, and then writing

$$\frac{dG}{dt} = \frac{\partial G}{\partial x} \frac{dx}{dt} + \frac{\partial G}{\partial y} \frac{dy}{dt} = 0$$

and then using the chain rule to identify $\frac{dy}{dx}$.

Formal (but incomplete) answer:

The formal result being used here is called the implicit function theorem. In the general situation, you have a differentiable function $G : \mathbb{R}^{n + m} \to \mathbb{R}^m$, and are considering the implicit function $g : \mathbb{R}^n \to \mathbb{R}^m$ defined by $G(x,g(x))=0$. (Here I am abusing notation somewhat; $g$ is really only defined on a subset of $\mathbb{R}^n$, and $G$ itself may only be defined on a subset of $\mathbb{R}^{n+m}$.) We define

$$Dg : \mathbb{R}^n \to \mathbb{R}^{m \times n}$$

to be the Jacobian of $g$,

$$D_y G : \mathbb{R}^{n + m} \to \mathbb{R}^{m \times m}$$

to be the Jacobian of $G$ with respect to the "$y$" variables, and

$$D_x G : \mathbb{R}^{n+m} \to \mathbb{R}^{m \times n}$$

to be the Jacobian of $G$ with respect to the "$x$" variables. Then the implicit function theorem tells us that if $G(x_0,y_0)=0$ and $D_y G(x_0,y_0)$ is invertible, then $g$ is defined and differentiable on a neighborhood of $x_0$ and

$$Dg(x) = -D_yG(x,g(x))^{-1} D_x G(x,g(x)).$$

I don't think the rigorous proof of this statement is accessible to a multivariable calculus student. The rigorous proof I know of can be found in Strichartz' The Way of Analysis. This constructs $g$ using a contraction principle, but the procedure is complicated by the fact that the domain has to be open in order to make sense of derivatives, while the contraction principle needs the domain to be closed.

0
On

That follows is a geometric argument. We consider the variations of $f(x)$ on the variety $V$ defined as $g_1(x)=\cdots=g_k(x)=0$. The Lagrange condition in $x\in V$ is: $(\star)$ there is $(\lambda_i)_i$ s.t. $Df(x)=\sum_i\lambda_iDg_i(x)$. Note that $(\star)$ is equivalent to $\cap_i \ker(Dg_i(x))\subset\ker(Df(x))$, that is $T_xV\subset\ker(Df(x))$ where $T_x(V)$ is the tangent space of $V$ in $x$. Finally the Lagrange condition in $x$ is equivalent to : for every $v\in T_xV$, $f(x+v)=f(x)+o(||v||)$. In particular, if $f_{|V}$ admits a local extremum in $x\in V$, then the previous relation is satisfied.

0
On

Consider the directional derivative of $f(x,y)$ constrained to the surface $G(x,y)=0$ at some point (a,b).

This will be equal to the projection of $\nabla f(a,b)$ onto the tangent plane of the surface $G(x,y)=0$ at (a,b) .

$\nabla G(a,b)$ is the normal vector to the tangent plane.

Setting $\nabla f(a,b)$ parallel to $\nabla G(a,b)$ guarantees that the directional derivative is zero.