Solving a modified norm minimization problem

139 Views Asked by At

I have the following minimization problem in $x \in \mathbb{R}^n$

$$\begin{array}{ll} \text{minimize} & \|x\|_2 - c^T x\\ \text{subject to} & Ax = b\end{array}$$

where $A \in \mathbb{R}^{m \times n}$ is right-invertible, $b \in \mathbb{R}^m$ and $c \in \mathbb{R}^n$.

I tried to solve this using Lagrange multipliers, but am unable to find a closed form solution for $x$, because the derivative of $\|x\|_2$ with respect to $x$, which is $\frac{x}{\|x\|_2}$, contains $\sqrt{x^Tx}$ in the denominator.

Any help would be appreciated.

3

There are 3 best solutions below

0
On

Using the derivatives of the Lagrange function $\mathcal{L}(x,\lambda) =\|x\|-c^Tx-\lambda^T(Ax-b),$ we get $$ \frac{x}{\|x\|} - c - A^T \lambda = 0 $$ Note that $\lambda\in\mathbb{R}^m.$ This results in $x = (c+A^T\lambda)\,\|x\|.$ We insert this $x$ in $Ax=b$ and we get $A(c+A^T\lambda)\,\|x\| = b$ or $$ AA^T\lambda = \frac{b}{\|x\|} - Ac $$ or $$ \lambda = \left(AA^T\right)^{-1}\left(\frac{b}{\|x\|} - Ac\right) $$ We put this $\lambda$ into $x = (c+A^T\lambda)\,\|x\|$ and we get $$ x = \left(c+A^T\left(AA^T\right)^{-1}\left(\frac{b}{\|x\|} - Ac\right)\right)\,\|x\| $$ or $$ x = \left(I-A^T\left(AA^T\right)^{-1}A\right) c \|x\| + A^T\left(AA^T\right)^{-1} b $$ We define $v$ as the projection of $c$ on the kernel of $A,$ which is $v = (I-A^T (AA^T)^{-1}A)c,$ and we define $x_0 =A^T\left(AA^T\right)^{-1} b,$ such that we get $$ x= v \|x\| + x_0 $$ We now have $$ \|x\|^2 = x^Tx = (v \|x\| + x_0)^T(v \|x\| + x_0) $$ It can easily been shown than $x_0^Tv = 0.$ Therefore, $$ (v^Tv-1)\|x\|^2 + x_0^Tx_0 = 0 $$ This is a quadratic equation for $\|x\|.$ We can solve this for $\|x\|$ and plug this $\|x\|$ into $x= v \|x\| + x_0.$

I have only addressed the possibility to get a closed form for $x,$ but not the question if this $x$ is actually a valid solution of the problem. The resulting $x$ fulfills the necessary condition for solutions at differentiable points of the Lagrange function, but it does not necessarily fulfill the sufficient conditions. If we can only get complex solutions, this means that the problem is not bounded. Note also that the Lagrange function is not differentiable at $x=0$, which means that this point must be addressed separately.

0
On

Let $A^+$ be the Moore Penrose inverse of $A$. Note that $x=x_0+u$ where $u$ varies in the image of the symmetric matrix $I_n-A^+A$, a space $F$ of dimension $n-m$ (cf the Hans comment). Moreover $E=\ker(I_n-A^+A)$ is the orthogonal of $F$; considering an orthonormal basis associated to the decomposition $E\bigoplus F$, we may assume that

$x_0=[a,0]^T,u=[0,x]^T,c=[w,v]^T$. Then

$g(x)=||x_0+u||-c^Tu=\sqrt{||a||^2+||x||^2}-v^Tx$. Then

$Dg_x=0$ iff $x/\sqrt{||a||^2+||x||^2 }=v$. Then $x=\lambda v$ where $\lambda=\sqrt{||a||^2+\lambda ^2||v||^2}\geq 0$.

Finally, if $||v||\geq 1$, then no solution and no finite minimum. If $||v||<1$, then

$\lambda=\dfrac{||a||}{\sqrt{1-||v||^2}}$.

It remains to show that the above $u$ is associated to the minimum of $g$; that is your business.

0
On

The linear equation $Ax=b$ has the general solution $$x = A^+b + Pw$$ where $A^+$ is the Penrose inverse, $P=(I-A^+A)$ is the projector into the nullspace, and $w$ is an arbitrary vector. Using this, we can replace the constrained problem in terms of $x$ with an unconstrained problem in terms of $w$.

For convenience, define the variables $$\eqalign{ y &= Pc, \quad &Py = y \cr x_b &= A^+b, \quad &x_w = Pw = Px, \quad &x &= x_b + x_w \cr \lambda_b^2 &= x_b:x_b, \quad &\lambda_w^2 = x_w:x_w, \quad &\lambda^2 &= \lambda_b^2 + \lambda_w^2 \,= x:x \cr }$$ Note the following implications
$\quad P$ is ortho-projector $\implies P^2=P=P^T$
$\quad A$ is right-invertible $\implies A^+=A^T(AA^T)^{-1}\,$ and $\,AA^+=I$
$\quad AP\,=0\,\implies Ax=(AA^+b+0)=b$
$\quad PA^+\!=0\implies Px=Pw\,\,$ and $\,\,x_b:x_w=0$
$\quad\lambda^2\!=x\!:\!x\implies \lambda\,d\lambda = x:dx$

Start by calculating the gradient of the cost function. $$\eqalign{ \phi &= \lambda - c:x \cr d\phi &= d\lambda - c:dx = \Big(\frac{x}{\lambda}-c\Big):dx \cr &= \Big(\frac{x-\lambda c}{\lambda}\Big):P\,dw \cr &= \frac{Pw-\lambda y}{\lambda}:dw \cr \frac{\partial\phi}{\partial w} &= \frac{Pw-\lambda y}{\lambda} \cr }$$ Setting the gradient to zero, we can solve for the vector $\,\,w=\lambda y$.

Substitute this value of $w$ into the $\lambda$-relationships. $$\eqalign{ \lambda_w^2 &= Pw:Pw = \lambda Py:\lambda Py = \lambda y:\lambda y \cr \lambda^2 &= \lambda_b^2 + \lambda_w^2 = \lambda_b^2 + \lambda^2(y:y) \cr \lambda^2 &= \frac{\lambda_b^2}{1-y:y} = \frac{(A^+b):(A^+b)}{1-(Pc:Pc)} \cr }$$ Choosing the real positive square root (if it exists) completes the solution $$\eqalign{ x &= A^+b + \lambda y \cr\cr }$$ NB: Many of the steps above utilize the trace/Frobenius product $$A:B = {\rm Tr}(A^TB)$$