Why doesn't minimizing the Lagrangian derivative with regards to the multiplier work?

99 Views Asked by At

Given a typical optimization problem for a scalar function of a vector $x$ with a set of equality constraints $$ \min_x f(x)\\ \text{s.t.}\quad g(x)=\mathbf{0} $$ To solve the above, one would define a Lagrangian function with a vector of multipliers for each of the constraints as follows $$ \mathcal{L}(x,\lambda) = f(x)+g(x)^T\lambda $$ Using subscripts to denote differentiation, the solution $(x^*,\lambda^*)$ is found when $$ \mathcal{L}_x(x^*,\lambda^*)=0 $$ When solving using numerical method, one would use the Hessian to solve the following systems of equations to calculate a step $$ \left(\begin{matrix}\mathcal{L}_{xx} & g_x^T \\ g_x & 0\end{matrix}\right) \left(\begin{matrix}s_x \\ s_\lambda \end{matrix}\right) = \left(\begin{matrix}-\mathcal{L}_x \\ -g \end{matrix}\right) $$ such that the system will proceed to a new point $(x+s_x,\lambda+s_\lambda)$. In my experience with solving this linear system, setting the initial $\lambda$ to be large is usually enough to make it the solver run smoothly.

Now, since the solution requires $\mathcal{L}_x=0$, in order to accelerate the process I am tempted to try solve the following problem $$ \min_\lambda \frac{1}{2}|\mathcal{L}_x|^2 $$ for which the minimizing $\lambda$ is found at $$ \lambda = -(g_xg_x^T)^{-1}g_xf_x $$ So this $\lambda$ would be calculated first, and after that, $\mathcal{L}_{x}$ and $\mathcal{L}_{xx}$. But when I tried this trick, the solver failed miserably. Why is that so?

1

There are 1 best solutions below

1
On

If there are fewer constraints than variables, then $g: \mathbb{R}^K \rightarrow \mathbb{R}^J$ with $J<K$. If the constraints are not redundant then $rk(g_x(x))=J$.

The optimal values satisfy the first order conditions $$ f_x(x)+g_x(x)^T\lambda=0,$$ a system of $K$ equations with $K+J$ unknown $(x,\lambda)$.

After inserting your expression for $\lambda$ into the first order conditions, you obtain the reduced system of $K$ equations and $K$ unknown: \begin{equation} \tag{1} (I_K-g_x^T(g_xg_x^T)^{-1}g_x)f_x(x)=0. \end{equation}

This system has an infinite number of solutions. Indeed, the matrix $P:=g_x^T(g_xg_x^T)^{-1}g_x$ is an orthogonal projector onto $col(g^T_x)$, and $rk(P)=J<K$. The matrix $I_K-g_x^T(g_xg_x^T)^{-1}g_x$ is an orthogonal projector onto $col^\perp (g^T_x)$ and the rank of this matrix is $K-J$.
If you consider only the system (1), your solution is undetermined, but if you exploit the information comprised in the $J$ constraints $g(x)=0$ and include them to the system (1) of $K-J$ independent equations, this would be helpful to reduce the indeterminatness of the solution, and to find possibly a finite number of solutions.