The steepest descent direction constrained to non-negative variables

121 Views Asked by At

Recently, another user asked about the steepest descent direction constrained to non-negative variables, but using the $L_1$ norm to avoid null directions, see this link. His ideas led to the following optimization problem: $$ \begin{align} \tag{*} \min_{d \in \mathbb{R}^{n}} &\ c^{T} d \\ \text{subject to } & \text{e}^{T} d = 1 \\ & d\geq 0, \end{align} $$ in which $\text{e} = (1,1,1, \cdots ,1)^{T}.$ However, this direction of descent might be thought as not a good one, since it tends to look for directions whose coordinates are sparse. Hence, if one wants to minimize in the positive octant using a descent algorithm, it would be preferable to look for non-zero descent directions whose Euclidean norm is 1. That is, instead of looking for directions of (*), one should look for directions $d$ solving the problem $$ \begin{align} \tag{**} \min_{d \in \mathbb{R}^{n}} &\ c^{T} d \\ \text{subject to } & \dfrac{1}{2}\|d\|^2_{2} = \dfrac{1}{2} \\ & d\geq 0, \end{align} $$ where $\| \|_2$ means the Euclidean norm. The question is:

What are the directions $d$ which solves the problem (**)?

It does not seem hard to find good candidates for minimizers, but it does not seem trivial also -- after some thought.

2

There are 2 best solutions below

2
On BEST ANSWER

The Non-Negative Least-Squares problem is $$\min_x\;\|Ax-b\|^2_2 \qquad{\rm s.t.}\;\;x_k\ge 0$$ Remapping the variables to $\;\left(A=c^T,\;b=[\,{\tt1}\,],\;x=d\,\right)\;$ recovers the initial problem.

The standard solver for this problem is Matlab's lsqnonneg which has been ported to many other languages (R, Octave, Scilab, Julia, etc).

An algorithm for very large problems is described in this paper by Dhillon, Kim & Sra.

Another possibility is the Sequential Coordinate-wise algorithm by Frank, Navara & Hlavac.

0
On

In the case of $c\geq 0$, the solution is relatively easy. Any normalized vector $d$ such that $c^{T}d=0$ and $d\geq 0$ solves the problem.

In the case in which, for some $i$, $c_i<0$, we need the KKT conditions. First, It is not hard to find that every feasible point of (**) satisfies a constraint qualification, for example, the LICQ conditions. Hence, every minimizer satisfies the KKT conditions. This way, we can write the KKT conditions for a minimizer of $(**)$ as $$\begin{align} c-\lambda x-\mu = &\ 0, \tag{stationarity}\\ \mu \geq 0 \text{ and } \lambda \in &\ \mathbb{R}\tag{dual feasibility}\\ x^{T} \mu = &\ 0, \tag{complementary slackness} \\ \|x\|_2^2 = 1, \text{and } x \geq & \ 0 \tag{primal feasibility}, \end{align}$$

Multiplying the stationarity condition by $x^{T}$ on the left, we have $c^T x- \lambda = 0.$ Thus $c^{T} x =\lambda.$

On the other hand, letting $x_j>0$, by the complementary slackness, $\mu_j = 0$, and, by the stationarity condition, $$ c_j = c^{T} x\ x_j \tag{***}.$$ On the other hand, since $c^{T} x \leq c^{T} e_i = c_i<0$. This means, by (***), $c_j<0$ whenever $x_i>0.$ Also, note that it can't be $c_j<0$ and $x_j=0$ by the stationarity condition. Otherwise, $\mu_i<0$. Hence, $$c_j<0 \text{ if and only if } x_j>0.$$

Also, we have that, multiplying (***) by $c_j$ and summing the things up, we have $\sum_{x_j>0} c_j^2 = (c^{T} x)^2$. Consequently, again by (***), $$ x_j = \left\lbrace \begin{align} - \dfrac{c_j}{\sqrt{\sum_{x_j>0}c_j^2}}, & \text{ if } x_i>0\\ 0, & \text{ if } x_j=0. \end{align}\right. $$ Combining all the results, this vector can be rewritten as $$x = - \dfrac{c_{(-)}}{{\|c_{(-)}\|_{2}}},$$ where $c_{(-)} = (\text{min}\{c_1,0\}, \text{min}\{c_2,0\}, \text{min}\{c_3,0\}, \cdots, \text{min}\{c_n,0\})$.

P.S.: It was left the subtle case in which $c>0$, since $c^{T}d=0$ and $d\geq 0$, implies $d=0$ in this case. Similarly to what we have done, it's not hard to prove that $c^{T}x = \sqrt{\sum_{x_j>0}c_j^2}>0$, for every global minimizer $x$. Since the global minimizer is the one which minimizes $\sqrt{\sum_{x_j>0}c_j^2}$ for all possible $x$, the solution is $$x=\text{e}_{\ell},$$ where $\ell \in \text{argmin}\{c_j\}$ and $e_\ell=(0,0,0,\cdots, 1, \cdots 0)$ -- the $1$ is in the $\ell$-th coordinate.