Proving Lagrange method by using Implicit Function Theorem

2.5k Views Asked by At

I am trying to show the proof of the Lagrange multiplier method. According to this in general, if $f$ and $g$ are $D+1$ dimensional functions such that $f,g : \mathbb{R}^{D+1} \mapsto \mathbb{R}$, and if the point $p$ with $p=(x',y')$ where $x'$ is a $D$ dimensional vector and $y'$ is a scalar, is a constrained local extremum with subject to the constraint $g(x,y)=0$ and $\dfrac{\partial g(p)}{\partial y} \neq 0$, then at $p$ it is $\nabla f(p) = \lambda \nabla g(p)$.

My approach using the implicit function theorem is the following: From the above statement, for $g$, we can determine a ball around $x'$ for a $r > 0$ such that there is a function $h: B(x',r) \mapsto \mathbb{R}$ and it is $g(x,h(x))=0$ for each $x \in B(x',r)$. In this ball, we can state $f$ as $f(x,h(x))$ which always satisfies the constraint. Now, in $B(x',r)$ the unconstrained optimization of $f(x,h(x))$ will give us the constrained extremum point, $(x',y')$ and at $x'$ the gradient of $f(x,h(x))$ wrt $x$ vanishes. So it is at $p=(x',y')$ $$\sum_{i=1}^D \dfrac{\mathrm{d} F}{\mathrm{d} x_i}=\sum_{i=1}^D (\dfrac{\partial F}{\partial x_i} + \dfrac{\partial F}{\partial y}\dfrac{\partial h}{\partial x_i} )=0$$

We can differentiate $g(x,h(x))$ at $p=(x',y')$ wrt to $x$ as well. It should be trivially equal to zero. So, it is $$\sum_{i=1}^D \dfrac{\mathrm{d} g}{\mathrm{d} x_i}=\sum_{i=1}^D (\dfrac{\partial g}{\partial x_i} + \dfrac{\partial g}{\partial y}\dfrac{\partial h}{\partial x_i} )=0$$.

I obtained all parts belonging to the gradients of two functions $f$ and $g$ but I am still not able to show $\nabla f(p) = \lambda \nabla g(p)$. What is missing in my construction, how can we reach to this statement from the derivates written above?

1

There are 1 best solutions below

2
On BEST ANSWER

One writes that the gradient of $\alpha(x)=f(x,h(x))$ wrt x vanishes as you say. Using the chain rule to compute the derivatives wrt the $x_j$'s: $$ 0=\frac{\partial\alpha}{\partial x_j}=\Big(\sum_{i=1}^D\frac{\partial f}{\partial x_i}\frac{\partial x_i}{\partial x_j}\Big)+ \frac{\partial f}{\partial y}\dfrac{\partial h}{\partial x_i}=\frac{\partial f}{\partial x_j}+ \frac{\partial f}{\partial y}\dfrac{\partial h}{\partial x_i} \quad (1\le j\le D). $$ Similarly, from $g(x,h(x))=0$ one gets $$ \phantom{\frac{\partial\alpha}{\partial x_j}}0=\frac{\partial g}{\partial x_j}+ \frac{\partial g}{\partial y}\dfrac{\partial h}{\partial x_i}\quad (1\le j\le D). $$ The first system of equations say $\nabla f$ is proportional to $\nabla h$, and the latter gradient comes from the second system. Explicitely $$ \nabla f=-\frac{\partial f}{\partial y}\nabla h=\frac{\frac{\partial f}{\partial y}}{\frac{\partial g}{\partial y}}\nabla g=\lambda\nabla g, $$ which is possible by the assumption $\frac{\partial g}{\partial y}\ne0$ (everything at the given point). Properly speaking, the systems give $\lambda$ for the $x_i$-components of the gradient, while for the $y$-component it works trivially. In fact, we could have predicted the value of $\lambda$ just looking at that last component.