Element-wise Optimization with Lagrange Multiplier Vector

1.2k Views Asked by At

The problem is $$ \text{argmin}_{\mathbf{x}}f\left(\mathbf{x}\right)=\frac{1}{2}\mathbf{x}^{\top}\mathbf{x}\hspace{1em}\hspace{1em}\mathrm{st.}\hspace{1em} \mathbf{C}\mathbf{x}=\mathbf{t} $$ where $\mathbf{C}$ is a $n\times m$ matrix with rank $n$, $\mathbf{x}$ is a $m\times1$ and $\mathbf{t}$ is a $n\times1$ vectors. We can solve this problem by matrix derivatives:

$$ f\left(\mathbf{x}, \boldsymbol{\lambda}\right)=\frac{1}{2}\mathbf{x}^{\top}\mathbf{x}-\boldsymbol{\lambda}^{\top}\left(\mathbf{C}\mathbf{x}-\mathbf{t}\right) $$ then \begin{align*} \partial f/\partial\mathbf{x} & =\mathbf{x}-\mathbf{C}^{\top}\boldsymbol{\lambda}=\mathbf{0}\\ \partial f/\partial\boldsymbol{\lambda} & =-\left(\mathbf{C}\mathbf{x}-\mathbf{t}\right)=\mathbf{0} \end{align*} Multiplying the first equation by $\mathbf{C}$ from left and applying $\mathbf{C}\mathbf{x}=\mathbf{t}$ we find $\boldsymbol{\lambda}=\left(\mathbf{C}\mathbf{C}^{\top}\right)^{-1}\mathbf{t}$. Putting this $\boldsymbol{\lambda}$ in the first equation, we get $$ \mathbf{x}=\mathbf{C}^{\top}\left(\mathbf{C}\mathbf{C}^{\top}\right)^{-1}\mathbf{t}. $$

My question is "How can I get the solution by element wise derivatives?". That is, by using derivatives wrt $x_{1},x_{2},\ldots,x_{m}$.

The problem I confront is that I can not find $\boldsymbol{\lambda}$ vector (and so ${\lambda_j}$ values) as in the solution above when the derivatives are with respect to one variable $x_j$.

2

There are 2 best solutions below

1
On

The complete and elegant solution you have shows that the $\lambda_i$ are the uniquely determined solutions of a certain system of linear equations, whereby the matrix of this system is $CC^\top$, assumed to be of rank $n$. You cannot hope that solving the original problem "the pedestrian way" will give you a simpler access to these $\lambda_i$.

0
On

Since the constraint is a linear equality, you can "solve for $x$" by introducing a new unconstrained variable $u$ $$\eqalign{ \def\c#1{\color{red}{#1}} \def\qiq{\quad\implies\quad} Cx &= t \qiq x &= (I-C^+C)\,\c{u} \;+\; C^+t \\ &&= P\c{u} \;+\; C^+t \\ }$$ where $C^+$ is the pseudoinverse of $C,\,$ and $P$ is the nullspace projector which is an orthoprojector $$P^2=P=P^T$$ This results in an unconstrained problem in terms of $u$.

First, calculate the gradient wrt $u$ $$\eqalign{ \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \phi &= \frac 12\,(x^Tx) \\ d\phi &= x^T\c{dx} \;=\; x^T(\c{P\,du}) \;=\; (Px)^T\,du \\ \grad{\phi}{u} &= Px \\ }$$ Then the usual prescription of setting the gradient to zero yields the optimal solution as $$\eqalign{ \big(I-C^+C\big)\,x &= 0 \qiq x = C^+t \\ }$$