Constrained optimisation question

106 Views Asked by At

enter image description here

Since $f$ has a local extremum at $x_1$, then surely the LHS of equation (3) always zero? If so, then isn't lambda always simply zero too? But this cannot be, otherwise the last sentence of the theorem wouldn't be phrased the way it is. What am I missing?

3

There are 3 best solutions below

2
On BEST ANSWER

It's not $f$ that has a local extremum at $x_{1}$, but rather $f|_{S}$. Consider, for example, $f(x,y)=xy$ restricted to the line $y=1-x$. This restricted function has a local maximum at $(\frac{1}{2},\frac{1}{2})$, but the full function $f$ is not at an extremum at $(\frac{1}{2},\frac{1}{2})$.

0
On

The function $g$ defines a surface(in this case a curve) over which we have to minimize the value of f. The condition simply means that the gradients should be $parallel$ or $anti-parallel$, otherwise we can increase $f$ by moving in the direction of derivative appropriately.

For more read about the method of lagrange multipliers.

0
On

The answer is that the appropriate gradient is zero.

Suppose $\hat{x}$ is a minimizer of $f$ subject to the constraint $g(x) = 0$, and $Dg(\hat{x}) \neq 0$. Suppose in particular that $Dg(\hat{x})e_1 \neq 0$ (at least one component is non-zero, it simplifies notation to assume it is the first). Then the implicit function theorem gives the existence of a function $\zeta: \mathbb{R}^{n-1} \to \mathbb{R}$ that satisfies $g((\zeta((x_2,...,x_n)), x_2,...,x_n)) = 0$ for all $(x_2,...,x_n)$ in a neighbourhood of $(\hat{x_2},...,\hat{x_n})$, $\zeta(\hat{x_2},...,\hat{x_n})= \hat{x_1}$ and furthermore, $\frac{\partial \zeta ( \hat{x_2},...,\hat{x_n}) }{\partial x_k} = - ( \frac{\partial g ( \hat{x_1}, \hat{x_2},...,\hat{x_n}) }{\partial x_1})^{-1} \frac{\partial g ( \hat{x_1}, \hat{x_2},...,\hat{x_n}) }{\partial x_k} $, for $k=1,...,n$.

Then $( \hat{x_2},...,\hat{x_n})$ is a minimizer for $\phi$ defined by $\phi(x_2,...,x_n) = f((\zeta((x_2,...,x_n)), x_2,...,x_n))$, and so we have that $D\phi(( \hat{x_2},...,\hat{x_n})) = 0$. Using the chain rule, this translates to $\frac{\partial \phi ( \hat{x_2},...,\hat{x_n}) }{\partial x_k} = \frac{\partial f ( \hat{x_1}, \hat{x_2},...,\hat{x_n}) }{\partial x_1} \frac{\partial \zeta ( \hat{x_2},...,\hat{x_n}) }{\partial x_k} + \frac{\partial f ( \hat{x_1}, \hat{x_2},...,\hat{x_n}) }{\partial x_k} = 0$. Substituting the value for $\frac{\partial \zeta ( \hat{x_2},...,\hat{x_n}) }{\partial x_k}$ gives $\frac{\partial f ( \hat{x_1}, \hat{x_2},...,\hat{x_n}) }{\partial x_k} = \frac{\partial f ( \hat{x_1}, \hat{x_2},...,\hat{x_n}) }{\partial x_1} ( \frac{\partial g ( \hat{x_1}, \hat{x_2},...,\hat{x_n}) }{\partial x_1})^{-1} \frac{\partial g ( \hat{x_1}, \hat{x_2},...,\hat{x_n}) }{\partial x_k}$, $k=2,...,n$. Setting $\lambda = \frac{\partial f ( \hat{x_1}, \hat{x_2},...,\hat{x_n}) }{\partial x_1} ( \frac{\partial g ( \hat{x_1}, \hat{x_2},...,\hat{x_n}) }{\partial x_1})^{-1}$ gives the desired result: $Df(\hat{x}) = \lambda Dg(\hat{x})$.

Note that this is equivalent to the reduced gradient being zero, that is, $D\phi(( \hat{x_2},...,\hat{x_n})) = 0$.

Also note that the condition $Dg(\hat{x}) \neq 0$ is required to use the implicit function theorem. If this condition is not met, then a multiplier of the above form may not exist.

For example, if the constraint was $g(x) = \|x\|^2$, then, regardless of $f$, the solution must be $\hat{x} = 0$, and the Lagrange multiplier condition cannot be used.