Normal cone of $\mathbb{R}^n_+$ and $S^n$?

2.2k Views Asked by At

I'm trying to solve the optimization problem

$$\min_x \{f(x) + \delta_X(x)\}$$

where $f$ is a differentiable function and $\delta$ is the indicator function

$$\delta_X(x) = \begin{cases}0, & x \in X \\ +\infty, & x \notin X\end{cases}$$

For the cases where $X$ is the positive orthant $\mathbb{R}^n_+$ and the subspace of the positive definite matrices $S^n$.

I am aware that the optimality conditions give us that

$$-\nabla f(x) \in N_X(x)$$

where $N$ is the normal cone. But I'm having some trouble calculating the normal cone of those sets.

Is what I did right? For the case of $\mathbb{R}^n_+$ I reached that if the point $x$ is in the interior of $X$ then $N_x(x) = \{0\}$, and if it's on the boundary, then $N_x(x)$ is the subspace of the vector which are zero on the coordinates of $x$ that are 0 and are nonzero on the ones it is 0, is that right?

And how do I proceed on the $S^n$ case? The normal cone definition ins't helping me here.

2

There are 2 best solutions below

2
On BEST ANSWER

As you have found, if $x\in\mathop{\textrm{Int}} X$, then $N_X(x)=\{0\}$, and that's true of any set, not just $\mathbb{R}^n_+$.

For a point on the boundary, I think it is easier to derive the tangent cone, which is the closure of the set of directions you can "move" in from $x$ while remaining in the set: $$T_X(x) = \mathop{\textrm{cl}}\left\{ v ~\middle|~ \exists t>0 ~ x+tv \in X \right\}$$ The normal cone is the polar of the tangent cone: $$N_X(x) = T_X(x)^\circ = \left\{ z ~\middle|~ \langle z,w\rangle\leq 0~\forall w\in T_X(x) \right\}$$ For instance, for $\mathbb{R}^n_+$, $$v\in T_X(x) \quad\Longrightarrow\quad \begin{array}{ll} v_i \in\mathbb{R} & x_i > 0 \\ v_i \geq 0 & x_i = 0 \end{array} \quad i=1,2,\dots n$$ It's not difficult to see why: when $x_i>0$, you can move in either the positive or negative direction and remain within the set; but when $x_i=0$, you can only move in the positive direction. The polar of the tangent cone is $$z\in N_X(x)=T_X(x)^\circ \quad\Longrightarrow\quad \begin{array}{ll} z_i = 0 & x_i > 0 \\ z_i \leq 0 & x_i = 0 \end{array} \quad i=1,2,\dots n$$ Confirm that this gives the proper result $N_X(x)=\{0\}$ for $x\in\mathop{\textrm{Int}} X$. So the normal cone consists of nonpositive vectors that are zero where $x_i$ is nonzero. Put another way, we have $$x\in \mathbb{R}^n_+,~z\in N_{\mathbb{R}^n_+}(x) \quad\Longrightarrow\quad -z\in\mathbb{R}^n_+,~x^Tz=0.$$ These are similar to the standard complementarity conditions for linear programming.

So that's the easy one. What about $S^n$? The intuition on the tangent cone is identical to that of $\mathbb{R}^n_+$, except that you have to consider the eigendirections. That is, let $P\in X$ be a positive semidefinite matrix, and let $P=Q\mathop{\textrm{diag}}(\lambda) Q^T$ be its Schur decomposition, where $\lambda\in\mathbb{R}^n$ is a vector of the eigenvalues. Then when $\lambda_i>0$, we can move in either the positive or negative eigendirection; but when $\lambda_i=0$, we can only move in the positive eigendirection. So we have this: $$T_{X}(P) = Q \mathop{\textrm{diag}}\left( T_{\mathbb{R}^n_+}(\lambda) \right) Q^T$$ Once you satisfy yourself with that, the normal cone follows by taking the polar. $$N_{X}(P) = T_{X}^\circ(P) = - T_{X}(P)^* = Q \mathop{\textrm{diag}}\left( N_{\mathbb{R}^n_+}(\lambda) \right) Q^T$$ So the normal cone is a set of negative semidefinite matrices constructed to have a "complementary" eigenspace to the input point. The corresponding complementarity conditions are $$P\in S^n,~R\in N_{S^n}(P)\quad\Longrightarrow\quad -R\in S^n,~\mathop{\textrm{Tr}}(PR)=0.$$

0
On

Here is a pedantic alternative to Michael's slick computation of the normal to a point in the positive semi definite symmetric matrices.

I am assuming below that we are working in the space of symmetric matrices.

Suppose $P \in S^n_+ $, then I claim that $N_{S^n_+}(P) = C=\{ A | A \le 0 , AP=0 \}$.

Suppose $A \in N_{S^n_+}(P) = \{ A | \langle A, S-P \rangle \le 0, \forall S \in S^n_+ \}$, then since $x^T Ax = \operatorname{tr} (A x x^T)$, by letting $S=P+x x^T$ we see that $x^T Ax \le 0$ hence $A \le 0$. Now suppose $y \in {\cal R} P$, then note that $P + t y y^T \ge 0$ for all $t$ in a neighbourhood of $0$ and letting $S = P + t y y^T$, we see that $y^TAy = \operatorname{tr} (A y y^T) = 0$ and since $A \le 0$ we have $y \in \ker A$. In particular, $AP = 0$.

Now suppose $A \in C$. Choose $S \in S^n_+$ and note that we can write $S = \sum_k \lambda_k u_k u_k^T$ for some $\lambda_k \ge 0$. Then $\langle A, S-P \rangle = \operatorname{tr} (A(S-P)) = \operatorname{tr} (AS) = \sum_k \lambda_k \operatorname{tr} (Au_ku_k^T) = \sum_k \lambda_k u_k^T A u_k \le 0 $.