Lagrange multiplier without implicit function theorem

Question

Lagrange multiplier without implicit function theorem

526 Views Asked by Bumbble Comm At 15 May 2026 - 8:20

Here is a proof of the Lagrange multiplier method from Calculus Early Transcendentals by James Stewart (8th ed). It does not rely on the Implicit Function Theorem like all other "rigorous" proofs seem to. What is the missing piece from this proof (which I guess relies on the Implicit Function Theorem) that would make this rigorous?

Suppose that a function $f$ has an extreme value at a point $(x_0, y_0, z_0)$ on the surface $S$ and let $C$ be a curve with vector equation $\vec{r}(t)=(x(t), y(t), z(t))$ that lies on $S$ and passes through $(x_0, y_0, z_0)$. If $t_0$ is the parameter value corresponding to the point $(x_0, y_0, z_0)$, then $\vec{r}(t_0)=(x(t_0), y(t_0), z(t_0))$. The composite function $h(t)=f(x(t), y(t), z(t))$ represents the values that $f$ takes on the curve $C$. Since $f$ has an extreme value at $(x_0, y_0, z_0)$, it follows that $h$ has an extreme value at $t_0$, so $h'(t_0) = 0$. But if $f$ is differentiable, we can use the Chain Rule to write $$0 = h'(t_0) = \nabla f(x_0, y_0, z_0) \cdot \vec{r'}(t_0)$$

This shows that the gradient vector $\nabla f(x_0, y_0, z_0)$ is orthogonal to the tangent vector $\vec{r'}(t_0)$ to every such curve $C$. We know that the gradient of $g$, $\nabla g(x_0, y_0, z_0)$, is also orthogonal to $\vec{r'}(t_0)$ for every such curve. This means that the gradient vectors $\nabla f(x_0, y_0, z_0)$ and $\nabla g(x_0, y_0, z_0)$ must be parallel.

Alternatively, an even simpler proof from MIT OCW goes as follows:

Consider any unit vector $\hat{u}$ at the critical point that is tangent to the constraint surface. Then, since the directional derivative along $\hat{u}$, $D_\hat{u} f = \nabla f \cdot \hat{u} = 0$ at the critical point so $\nabla f$ is perpendicular to any such $\hat{u}$. We know $\nabla g$ is perpendicular to the level curves of $g$, so $\nabla g$ is also perpendicular to any such $\hat{u}$, implying $\nabla f$ and $\nabla g$ are parallel.

What does introducing $\vec{r}(t)$ in the Stewart proof give us over this one? And, again, what is the piece here that needs to be shown more rigorously (presumably using the Implicit Function Theorem)?

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2019-04-13 09:53:10

The two proofs are equivalent (with slight non-consequential differences I will clarify later).

At this level, it's helpful to borrow some intuition from physics (after all that's where calculus came from).

Let's use just two coordinates instead of three to make things easier to visualize:

We have a hill, and $f(x,y)$ is the height of the hill at $(x,y)$. A hiker's horizontal location (horizontal since we are not using $z$) at any time t is given by $\vec{r}(t)$ in Steward (which basically gives us the entire history of the hiker's movement). OCW only concerns us with hiker's movement near the extremum (and doesn't bother making it explicit), since elsewhere it's irrelevant. The latter also specifies that the hiker travels at unit speed, which is inconsequential here. Steward doesn't specify the speed. So these are the slight differences.

Now, if we write out the derivative in OCW (making the location explicit as in Steward), it's (evaluated at 0):

$$ \frac{d}{dt} f(\vec{r}(t_0)+\hat u t) $$

For Steward, it's (evaluated at $t_0$):

$$ \frac{d}{dt} f(\vec{r}(t))$$

In the first case, apply chain rule we get:

$$ \nabla f(\vec{r}(t_0)) \cdot \hat u$$

In the second case:

$$ \nabla f(\vec{r}(t_0)) \cdot \vec{r}'(t_0)$$

So, same conclusion.

Personally, I think Steward's approach presents it in a more intuitive way (and painstakingly names every detail), so is easier for beginners to understand. OCW's approach is more pragmatic, and you will be using that kind of notation later on. There is not any difference in terms of rigor.

**Bumbble Comm** · Answer 2 · 2019-04-13 21:22:31

The point where you really require the implicit function theorem is when you start talking about "constraint surface" and "tangents". How can you know that your constraints locally determine some smooth surface?

For the Lagrange Multipliers itself, a weaker part if the IFT is enough; it follows directly from the local surjectivity. If $a$ is a point such that $f_1(a)=\ldots=f_n(a)=0$ and the gradients $f_1',\dots,f_n',g'$ are linearly independent, then the map $(f_1,\ldots,f_n,g)$ maps every ball around $a$ to a neighbourhood of $(0,\ldots,0,g(a))$, so in every ball around $a$, there exist points $b,c$ such that $f_1(b)=\ldots=f_n(b)=0$ and $g(b)>g(a)$, and $f_1(c)=\ldots=f_n(c)=0$ and $g(c)<g(a)$; this proves that there cannot be any local constrained extremum at $a$. Hence, at all constrained extremal points, the gradients $f_1',\dots,f_n',g'$ must be linearly dependent.

Lagrange multiplier without implicit function theorem

There are 2 best solutions below

Related Questions in CALCULUS

Related Questions in PROOF-VERIFICATION

Related Questions in ALTERNATIVE-PROOF

Related Questions in LAGRANGE-MULTIPLIER

Trending Questions

Popular # Hahtags

Popular Questions