Why does any dual optimal point provide a separating hyperplane between a point and its projection on a convex set?

170 Views Asked by At

On page 400, in chapter 8 (Geometric Problems) of Boyd & Vandenberghe's Convex Optimization, there is a discussion on identifying a separating hyperplane between a point and its projection on a convex set. He states that in norms other than the Euclidian norm, the hyperplane is found via Lagrangian duality.

The example is stated as follows. (Please pardon the spacing, still figuring out how to use mathjax)

$$\begin{array}{ll} \text{minimize} & \Vert y\Vert\\ \text{subject to} & f_i (x) \le 0, \quad i =1,\dots,m\\ & Ax = b\\ & x_0 - x = y\end{array}$$

  1. The Lagrangian of the problem is therefore stated as

$$ L(x,y,\lambda, \mu, \nu) = \Vert y \Vert + \sum_{i=0}^m \lambda_if_i(x)+\nu^T(Ax-b)+\mu^T(x_0-x-y)) $$

  1. The Lagrangian dual function (which I believe would be the minimum value of the Lagrangian over y which we obtain by setting $\nabla_y L(x,y,\lambda,\mu,\nu) = 0$) is then stated as $$g(\lambda, \mu, \nu) \biggl\{inf_x(\sum_{i=0}^m \lambda_if_i(x)+\nu^T(Ax-b)+\mu^T(x_0-x)) \quad \Vert\mu\Vert_*\leq1 $$ or $-\infty$ otherwise.

  2. The dual problem(the max of the dual function) is stated as follows.

$$ maximize\quad \mu^Tx_0+inf_x(\sum_{i=0}^m \lambda_if_i(x)+\nu^T(Ax-b)-\mu^Tx)$$ Subject to $$ \lambda \leq 0, \quad \Vert \mu \Vert_* \leq1$$

Question Why is it that if $ \Vert \mu \Vert_* \leq 1$, $\Vert y \Vert = \mu^Ty$ at the infimum? I'm assuming this is the case because the two terms appear to cancel out as we go from step 2 to step 3. I assume I'm just missing something obvious but just trying to gain some intuition on this either geometrically or algebraically. Thanks.

1

There are 1 best solutions below

9
On BEST ANSWER

From your step 2, the optimality condition for minimizing over $y$ can be written as $$ \mu \in \partial \|y\| $$ where $\partial g(y)$ is the subdifferential of $g$ at $y$, and is defined by the set of supporting vectors, e.g. $$ \partial g(y) = \{z : g(x)-g(y) - z^T(x-y) \geq 0,\forall x\}. $$ I am going to now try to prove that $\partial \|y\| = \{z : z^Ty = \|y\|,\; \|z\|_* \leq 1\}$. This is a property that I know is true, and you can find proofs of this for example here. But for fun, I'll try to do it constructively, so the technique is more clear. But, if you want to just stop here, this property is enough to explain the dual condition on $\mu$.

Plugging in $g(y) = \|y\|$, $$ \partial \|y\| = \{z : \|x\|-\|y\| \geq z^T(x-y),\forall x\}. $$ We can think about the worst case $x$ by thinking of this as a two-player game between $x$ and $z$, and first considering the moves for $x$. In other words, we want to first minimize
$$ \min_x\; \|x\|- \|y\| -z^T(x-y) \iff \min_x \; \|x\|-z^Tx. $$ We can decompose $x = \xi \hat x$, where $\xi = \|x\|$ and $\hat x = \frac{x}{\|x\|}$. Then the problem can be written as $$ \min_{\xi\geq 0} \;\min_{\hat x}\; \xi(1 -z^T\hat x) $$ and it turns out that, if you limit $\|\hat x\| = 1$, then the term $z^T\hat x$ is maximized when $z^T\hat x = \|z\|_*\|x\|$ (it is aligned with respect to the norm). This settles the maximization of $\hat x$, and we have $$ \max_z \; \min_{\xi\geq 0} \; \xi(1 -\|z\|_*) $$ From here we see that if $\|z\|_*> 1$, then $\xi$ will play $+\infty$. In that case, $\|z\|_*$ will play something $\leq 1$, and $\xi$ will be forced to play 0. Therefore, if we force $\|z\|_*\leq 1$, then the worst case $x$ will be 0.

Now we have reduced $$ \partial \|y\| = \{z : \|z\|_* \leq 1, \; z^Ty \geq \|y\|\}. $$ But since $z^Ty \leq \|z\|_*\|y\| \leq \|y\|$, this can only hold if either $\|z\|_* = 1$ or $\|y\| = 0$--hence, the generic statement $z^Ty = \|y\|$.