A complete proof of the Lagrange multipliers theorem using the implicit function theorem

981 Views Asked by At

While searching on MSE, I couldn't find a complete rigorous proof the method of Lagrange multipliers using the implicit function theorem. I tried to write a complete proof myself below, but am not sure about some details, mainly the reordering of coordinates part. Any advice is very appreciated.

Theorem. Let $f:\mathbb{R}^d\to\mathbb{R}$ and $h:\mathbb{R}^d\to\mathbb{R}^n$ be continuously differentiable functions, $C\in\mathbb{R}^n$, and $M=\{h=C\}$. Assume that $\text{rank } h'(x)=n$ for all $x\in M$. If $f$ attains a constrained local extremum at $a$, subject to the constraint $h(a)=C$, then there exists $\lambda_1,\dots,\lambda_n\in\mathbb{R}$ such that

$$\nabla f(a)=\lambda_1\nabla h_1(a)+\dots+\lambda_n\nabla h_n(a), \quad \quad (1)$$

where $\nabla f(a)=[\frac{\partial f}{\partial x_1} (a),\dots,\frac{\partial f}{\partial x_d} (a)]^\top$ and $\nabla h_i(a)=[\frac{\partial h_i}{\partial x_1} (a),\dots,\frac{\partial h_i}{\partial x_d} (a)]^\top$ for $i=1,\dots,n$.

Proof. Suppose $f$ attains a contrained local extremum at $a$. First note that $h'(a)=[\nabla h_1(a),\dots, \nabla h_n(a)]^\top$ is $n\times d$, and so $\text{rank } h'(x)=n$ implies that $n\leq d$ and that $h'(a)$ has $n$ linearly independent columns. If $n=d$, then $h'(a)$ is invertible, and we can write $f'(a)=\Lambda h'(a)$ with $\Lambda:=f'(a)[h'(a)]^{-1}:=[\lambda_1,\dots,\lambda_n]$. Transposing gives $(1)$. Hence assume WLOG that $n<d$.

Next we claim that we can assume WLOG that the first $n$ columns of $h'(a)$ are linearly independent. Otherwise there exists a permutation $\pi:\{1,\dots,d\}\to \{1,\dots,d\} $ that reorders the columns of $h'(a)$ so that it holds. Define $f^*:\mathbb{R}^d\to\mathbb{R}$ and $h^*:\mathbb{R}^d\to\mathbb{R}^n$ by

$$f^*(x_1,\dots,x_d)=f(x_{\pi^{-1}(1)},\dots,x_{\pi^{-1}(d)}), \quad h^*(x_1,\dots,x_d)=h(x_{\pi^{-1}(1)},\dots,x_{\pi^{-1}(d)})$$

Then we have

$$\frac{\partial f^*}{\partial x_j}(x_1,\dots,x_d)=\frac{\partial f}{\partial x_{\pi(j)}}(x_{\pi^{-1}(1)},\dots,x_{\pi^{-1}(d)}) \quad 1\leq j\leq d,$$

$$\frac{\partial h_i^*}{\partial x_j}(x_1,\dots,x_d)=\frac{\partial h_i}{\partial x_{\pi(j)}}(x_{\pi^{-1}(1)},\dots,x_{\pi^{-1}(d)}) \quad 1\leq i\leq n, \quad 1\leq j\leq d$$

and so the continuous differentiability of $f,h$ implies the continuous differentiability of $f^*,h^*$. Let $a^*=(a_{\pi(1)},\dots,a_{\pi(n)})$. Then

$$\frac{\partial f^*}{\partial x_j}(a^*)=\frac{\partial f}{\partial x_{\pi(j)}}(a) \quad 1\leq j\leq d,$$ $$\frac{\partial h_i^*}{\partial x_j}(a^*)=\frac{\partial h_i}{\partial x_{\pi(j)}}(a) \quad 1\leq i\leq n, \quad 1\leq j\leq d$$

It follows that the columns of $f^{*'}(a^*)$ and $h^{*'}(a^*)$ are those of $f'(a)$ and $h'(a)$ permuted according to $\pi$. Moreover we see that $f^*$ attains a constrained local extrumum at $a^*$, subject to the constraint $h^*(a^*)=C$. Since condition $(1)$ for $f,h$ at $a$ is equivalent to condition $(1)$ for $f^*,h^*$ at $a^*$, we have reduced the problem to the case where the first $n$ columns of $h'(a)$ are linearly independent.

As a last reduction, we may assume WLOG that $C=0$, for otherwise we can replace $h$ by $h-C$.

Let $m=d-n$ and denote points in $\mathbb{R}^d=\mathbb{R}^{n+m}$ by $(x,y)$ with $x\in\mathbb{R}^n$ and $y\in\mathbb{R}^m$. Then $h$ satisfies the conditions of the implicit function theorem at the point $a=(a_x,a_y)$, as found in Rudin PMA Theorem 9.27 for example. Therefore, there exists open sets $U\subset \mathbb{R}^{n+m}$ and $W\subset \mathbb R^m$ with $(a_x,a_y)\in U$ and $a_y\in W$, and a continuously differentiable function $g:W\to\mathbb R^n$ such that $g(a_y)=a_x$ and $h(g(y),y)=0$ for all $y\in W$. In particular we have

$$\{(g(y),y):y\in W\}\subset M \quad \quad (2)$$

Define the function $F:W\to \mathbb R$ by $F(y)=f(g(y),y)$. This functions is differentiable on $W$, and the chain rule gives $F'(y)=f'(g(y),y)\begin{bmatrix} g'(y)\\ I_m \end{bmatrix}$. Since $a_y\in W$ and $g$ is continuous at $a_y$, it follows from $(2)$ that $F$ has an unconstrained local extremum at $a_y$. For if there exists an $r>0$ such that $f|_{M\cap B_r(a)}$ has an unconstrained extremum at $a$, then by choosing $\delta>0$ sufficiently small we obtain $(g(y),y)\in M\cap B_r(a)$ whenever $y\in W\cap B_{\delta}(a_y)$. Using the fact that the derivative must vanish at a local extremum, and partitioning $f'(a)=[f_x'(a) \quad f_y'(a)]$, we get

$$0=F'(a_y)=[f_x'(a) \quad f_y'(a)]\begin{bmatrix} g'(a_y)\\ I_m \end{bmatrix}=f_x'(a)g'(a_y)+ f_y'(a) \quad \quad (3)$$

Next consider the function $H:W\to\mathbb R^{n+m}$ defined by $H(y)=h(g(y),y)$. This functions is differentiable on $W$, and the chain rule gives $H'(y)=h'(g(y),y)\begin{bmatrix} g'(y)\\ I_m \end{bmatrix}$. Moreover we have $H=0$ by definition of $g$, and so in particular

$$0=H'(a_y)=[h_x'(a) \quad h_y'(a)]\begin{bmatrix} g'(a_y)\\ I_m \end{bmatrix}=h_x'(a)g'(a_y)+ h_y'(a) \quad \quad (4)$$

where we partioned $h'(a)=[h_x'(a) \quad h_y'(a)]$. Since the first $n$ columns of $h'(a)$ are linearly independent, $h_x'(a)$ is invertible, and from $(4)$ we get

$$g'(a_y)=-[h_x'(a)]^{-1}h_y'(a) \quad \quad (5)$$

Substituting $(5)$ in $(3)$ and solving for $f_y'(a) $ we obtain

$$f_y'(a) =\Lambda h_y'(a) \quad \quad (6)$$

where $\Lambda:=f_x'(a)[h_x'(a)]^{-1}:=[\lambda_1,\dots,\lambda_n]$. But we also have

$$f'_x(a)=f'_x(a)[h_x'(a)]^{-1}h_x'(a)=\Lambda h_x'(a) \quad \quad (7)$$

so combining $(6)$ and $(7)$ gives

$$f'(a)=[f'_x(a) \quad f'_y(a)]=\Lambda [h_x'(a)\quad h_y'(a)]=\Lambda h'(a)$$

Transposing gives $(1)$, as desired.