What am I misunderstanding about Lagrange multipliers here?

121 Views Asked by At

Consider a constrained optimization problem over some euclidean space, that is for some real valued functions $f,g$ on that space we seek

$$ \underset{x}{\arg \min} \ f(x)\\ \text{subject to} \ g(x) = 0. $$

Then, one can look for possible optima by means of Lagrange multipliers. That is, a possible minimum $x^*$ fulfills for some $\lambda$

\begin{align} \nabla_x f(x^*) + \lambda \nabla_x g(x^*) = 0 \\ g(x^*) = 0. \end{align}

Now consider a constraint that is quadratic, which we can always achieve via

$$ \underset{x}{\arg \min} \ f(x)\\ \text{subject to} \ g(x)^2 = 0. $$ Of course, this should yield exactly the same minima as the original problem.

However, when we now look at the Lagrange conditions we have

\begin{align} \nabla_x f(x^*) + \lambda \nabla_x g(x^*)^2 = \nabla_x f(x^*) + 2 \lambda g(x^*) \nabla_x g(x^*) = 0 \\ g(x^*) = 0. \end{align} But then plugging the second equation in the first, we find that $\nabla_x f(x^*)=0$, that is $x^*$ should be a solution to the unconstrained minimization problem.

What am I doing wrong here? Am i missing some assumptions which make the Lagrange multiplier method invalid in the second case, or what is happening here?

If you want a concrete example:

Let's assume we want to find the closest symmetric matrix $B$ to some not necessarily symmetric matrix $A$. We could then define $f(B) = \vert \vert A - B \vert \vert^2$ and $g(B)^2 = \vert \vert B - B^T \vert \vert^2.$ We then have $$ \nabla_B f(B) + \lambda \nabla_B g(B) = 2( A - B )+ 2 \lambda (B-B^T) $$ such that for an optimal $B^*$ it would need to hold $A=B^*$.

1

There are 1 best solutions below

1
On BEST ANSWER

Looking at the calculus textbook used at my institution, here's the precise statement of the theorem of Lagrange multipliers in its simplest nontrivial case:

Let $f$ and $g$ be functions of two variables with continuous partial derivatives at every point of some open set containing the smooth curve $g(x,y)=0$. Suppose that $f$, when restricted to points on the curve $g(x,y)=0$, has a local extremum at the point $(x_0,y_0)$ and that $\nabla g(x_0,y_0)\neq0$. Then there is a number $\lambda$ for which $\nabla f(x_0,y_0)= \lambda \nabla g(x_0,y_0)$.

(A similar statement can be found in Stewart's books.) I've taught multivariable calculus four or five times at my present institution, as well as a handful of times at other places earlier, and I think I've never once written this precise statement on the board in class, so I spent a good long time puzzling about your question before finally looking it up; the point is that the condition $\nabla g \neq 0$ is really important but it rarely comes up "in practice" (like, say, on the exercises one assigns multivariable calculus students), and what you've done is found a clever example showing a really perverse situation in which you can render the theorem completely useless by making $\nabla g = 0$ everywhere on the constraint curve.