Orthogonal Projection onto the Intersection of Convex Sets

4k Views Asked by At

The general method of Projection on Convex Sets (POCS) can be used to find a point in the intersection of a number of convex sets i.e.

$$ \text{find } x \in \mathbb{R}^N \text{ s.t. } x \in \bigcap_i C_i $$

This method can find any feasible point in the intersection of the convex sets. Now my question is: Is there a similar method that can find a point $x^*$ that has minimum norm instead i.e. solve

$$ x^* = \arg \min_{x \in \mathbb{R}^N} \Vert x\Vert \text{ s.t. } x \in \bigcap_i C_i$$

or more generally find the projection of a point $a \in \mathbb{R}^N$ onto the intersection of the convex sets i.e. solve

$$ x^* = \arg \min_{x \in \mathbb{R}^N} \Vert x - a\Vert \text{ s.t. } x \in \bigcap_i C_i$$ for some $a \in \mathbb{R}^N$?

Edit

The problem we are trying to solve is a 3D tomography reconstruction problem, and thus the variable $x$ takes gigabytes of RAM. There is already a POCS algorithm (A variant of Algebraic Reconstruction Technique (ART)) that finds a feasible point. So is there a way to use it as a black-box or adapt it to find a minimum-norm solution instead?

4

There are 4 best solutions below

18
On

This can be done fairly easily using proximal algorithms. Let $\delta_i$ be the indicator function of $C_i$: \begin{equation} \delta_i(x) = \begin{cases} 0 & \text{if } x \in C_i, \\ \infty & \text{otherwise.} \end{cases} \end{equation} Your optimization problem can be written as \begin{equation} \text{minimize} \quad \| x - a \| + \sum_i \delta_i(x). \end{equation} The objective function is a sum of functions that have easy proximal operators. Thus, you can use the Douglas-Rachford method (together with the consensus trick) to solve this optimization problem.

The Douglas-Rachford method is an iterative method for minimizing the sum of two convex functions, each of which has an easy proximal operator. Since we have a sum of more than two functions, it may seem that Douglas-Rachford does not apply. However, we can get around this by using the consensus trick. We reformulate our problem as

\begin{align*} \text{minimize} & \quad \underbrace{\|x_0 - a \| + \sum_i \delta_i(x_i)}_{f(x_0,x_1,\ldots,x_m)} + \underbrace{\delta_S(x_0,x_1,\ldots,x_m)}_{g(x_0,x_1,\ldots,x_m)} \end{align*} where \begin{equation} S = \{(x_0,x_1,\ldots,x_m) \mid x_0 = x_1 = \cdots = x_m \} \end{equation} and $\delta_S$ is the indicator function of $S$. The variables in this reformulated problem are the vectors $x_0,x_1,\ldots,x_m$. The indicator function $\delta_S$ is being used to enforce the constraint that all these vectors should be equal. We are now minimizing a sum of two convex functions, $f$ and $g$, which have easy prox-operators because $f$ is a separable sum of easy functions and $g$ is the indicator function of a set that we can project onto easily. So we are now able to apply the Douglas-Rachford method.

The Douglas-Rachford iteration is just three lines, and the code for this problem could probably be written in about a page of Matlab (unless projecting onto the sets $C_i$ is very complicated).

4
On

I found an algorithm called Hybrid Steepest Descent Method (Explained in detail in the paper The Hybrid Steepest Descent Method for the Variational Inequality Problem Over the Intersection of Fixed Point Sets of Non Expansive Mappings) that can solve problems like

$$x^* = \arg \min_x f(x) \text{ s.t. } x \in \cap_i C_i$$

by iterating

$$ \begin{align} y^t & = (1-\beta)x^{t}+\beta\sum_{i}w_{i}P_{C_{i}}(x^{t}) \\ x^{t+1} & = y^{t}-\lambda_{t+1}\mu \nabla f(y^t) \end{align} $$

for a small enough $\mu>0$, $\lambda_t = 1/t $, $0<\beta\le 2$, $P_{C_i}$ is the projection onto set $C_i$, $w_i \in (0,1]$ s.t. $\sum_i w_i = 1$, and $\nabla f$ is the gradient of $f$.

So to solve the problem

$$x^* = \arg \min_x \Vert x-a \Vert^2 \text{ s.t. } x \in \cap_i C_i $$ we simply need to substitute $$f(x)=\frac{1}{2} \Vert x-a\Vert^2 \rightarrow \nabla f(x)=x-a$$

1
On

Projection on Convex Sets (POCS) / Alternating Projections does exactly what you want in case your sets $ \left\{ \mathcal{C}_{i} \right\}_{i = 1}^{m} $ are sub spaces.

Namely if $ \mathcal{C} = \bigcap_{i}^{m} \mathcal{C}_{i} $ where $ {\mathcal{C}}_{i} $ is a sub space and the projection to the set is given by:

$$ \mathcal{P}_{\mathcal{C}} \left( y \right) = \arg \min_{x \in \mathcal{C}} \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} $$

Then:

$$ \lim_{n \to \infty} {\left( \mathcal{P}_{\mathcal{C}_{1}} \circ \mathcal{P}_{\mathcal{C}_{2}} \circ \cdots \circ \mathcal{P}_{\mathcal{C}_{m}} \right)}^{n} \left( y \right) = \mathcal{P}_{\mathcal{C}} \left( y \right) $$

Where $ \mathcal{P}_{\mathcal{C}_{i}} \left( y \right) = \arg \min_{x \in \mathcal{C}_{i}} \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} $.

In case any of the sets isn't a sub space but any convex set you need to use Dykstra's Projection Algorithm.

In the paper Ryan J. Tibshirani - Dykstra’s Algorithm, ADMM, and Coordinate Descent: Connections, Insights, and Extensions you may find extension of the algorithm for $ m \geq 2 $ number of sets:

enter image description here

I wrote a MATLAB Code which is accessible in my StackExchange Mathematics Q1492095 GitHub Repository.

I implemented both methods and a simple test case which shows when the Alternating Projections Method doesn't converge to the Orthogonal Projection on of the intersection of the sets.

I also implemented 2 more methods:

  1. Quadratic form of the Hybrid Steepest Descent (See Quadratic Optimization of Fixed Points of Non Expensive Mappings in Hilbert Space). I implemented 2 variations of this.
  2. Consensus ADMM method which is related to the Dykstra algorithm from above.

While the Dykstra and ADMM methods were the fastest they also require much more memory than the Hybrid Steepest Descent method. So given one can afford the memory consumption the ADMM / Dykstra methods are to be chosen. In case memory is a constraint the Steepest Descent method is the way to go.

0
On

ADMM Consensus Trick and Orthogonal Projection onto an Intersection of Convex Sets

The ADMM Consensus Optimization

Given a problem in the form:

$$\begin{aligned} \arg \min_{ \boldsymbol{x} } \quad & \sum_{i}^{n} {f}_{i} \left( \boldsymbol{x} \right) \\ \end{aligned}$$

By invoking auxiliary variables one could rewrite it as:

$$\begin{aligned} \arg \min_{ \boldsymbol{x} } \quad & \sum_{i}^{n} {f}_{i} \left( \boldsymbol{x}_{i} \right) \\ \text{subject to} \quad & \boldsymbol{x}_{i} = \boldsymbol{z}, \; i = 1, 2, \ldots, n \\ \end{aligned}$$

Namely, it is a separable form with equality constraint on each variable.
In matrix form it can be written as:

$$\begin{aligned} \arg \min_{ \boldsymbol{x} } \quad & \sum_{i}^{n} {f}_{i} \left( \boldsymbol{x}_{i} \right) \\ \text{subject to} \quad & \boldsymbol{u} := \begin{bmatrix} \boldsymbol{x}_{1} \\ \boldsymbol{x}_{2} \\ \vdots \\ \boldsymbol{x}_{n} \end{bmatrix} = \begin{bmatrix} I \\ I \\ \vdots \\ I \end{bmatrix} \boldsymbol{z} \end{aligned}$$

By defining $ {E}_{i} $ to be the matrix such that $ \boldsymbol{x}_{i} = {E}_{i} \boldsymbol{u} $, namely the selector of the appropriate sub vector form $ \boldsymbol{u} $ we can write the problem in the ADMM form (The Scaled ADMM form):

$$\begin{aligned} \boldsymbol{u}^{k + 1} & = \arg \min_{ \boldsymbol{u} } \sum_{i}^{n} {f}_{i} \left( {E}_{i} \boldsymbol{u} \right) + \frac{\rho}{2} {\left\| {E}_{i} \boldsymbol{u} - \boldsymbol{z}^{k} + \boldsymbol{\lambda}^{k} \right\|}_{2}^{2} \\ \boldsymbol{z}^{k + 1} & = \arg \min_{ \boldsymbol{z} } \frac{\rho}{2} {\left\| \boldsymbol{u}^{k + 1} - \begin{bmatrix} I \\ I \\ \vdots \\ I \end{bmatrix} \boldsymbol{z} + \boldsymbol{\lambda}^{k} \right\|}_{2}^{2} \\ \boldsymbol{\lambda}^{k + 1} & = \boldsymbol{\lambda}^{k} + \boldsymbol{u}^{k + 1} - \begin{bmatrix} I \\ I \\ \vdots \\ I \end{bmatrix} \boldsymbol{z}^{k + 1} \\ \end{aligned}$$

Since the form is block separable it can be written in an element form:

$$\begin{aligned} \boldsymbol{x}_{i}^{k + 1} & = \arg \min_{ \boldsymbol{x}_{i} } {f}_{i} \left( \boldsymbol{x}_{i} \right) + \frac{\rho}{2} {\left\| \boldsymbol{x}_{i} - \boldsymbol{z}^{k} + \boldsymbol{\lambda}_{i}^{k} \right\|}_{2}^{2}, & i = 1, 2, \ldots, n \\ \boldsymbol{z}^{k + 1} & = \arg \min_{ \boldsymbol{z} } \frac{\rho}{2} \sum_{i = 1}^{n} {\left\| \boldsymbol{x}_{i}^{k + 1} - \boldsymbol{z} + \boldsymbol{\lambda}_{i}^{k} \right\|}_{2}^{2} \\ \boldsymbol{\lambda}_{i}^{k + 1} & = \boldsymbol{\lambda}_{i}^{k} + \boldsymbol{x}_{i}^{k + 1} - \boldsymbol{z}^{k + 1}, & i = 1, 2, \ldots, n \\ \end{aligned}$$

Remark: Pay attention that $ \boldsymbol{z}_{k + 1} = \frac{1}{n} \sum_{i = 1}^{n} \boldsymbol{x}_{i}^{k + 1} + \boldsymbol{\lambda}_{i}^{k} $. Namely the mean value of the set $ { \left\{ \boldsymbol{x}_{i}^{k + 1} + \boldsymbol{\lambda}_{i}^{k} \right\} }_{i = 1}^{n} $.

The Proximal Operator is given by $ \operatorname{Prox}_{\lambda f \left( \cdot \right)} \left( \boldsymbol{y} \right) = \arg \min_{\boldsymbol{x}} \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} + \lambda f \left( \boldsymbol{x} \right) $ and simplifying the optimization for $ \boldsymbol{z}_{k + 1} $ one can write the above as:

$$\begin{aligned} \boldsymbol{x}_{i}^{k + 1} & = \operatorname{Prox}_{\frac{1}{\rho} {f}_{i} \left( \cdot \right)} \left( \boldsymbol{z}^{k} - \boldsymbol{\lambda}_{i}^{k} \right), & i = 1, 2, \ldots, n \\ \boldsymbol{z}^{k + 1} & = \frac{1}{n} \sum_{i = 1}^{n} \boldsymbol{x}_{i}^{k + 1} + \boldsymbol{\lambda}_{i}^{k} \\ \boldsymbol{\lambda}_{i}^{k + 1} & = \boldsymbol{\lambda}_{i}^{k} + \boldsymbol{x}_{i}^{k + 1} - \boldsymbol{z}^{k + 1}, & i = 1, 2, \ldots, n \\ \end{aligned}$$

Orthogonal Projection onto an Intersection of Convex Sets

The problem is given by:

$$\begin{aligned} \arg \min_{ \boldsymbol{x} } \quad & \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} \\ \text{subject to} \quad & \boldsymbol{x} \in \bigcap_i \mathcal{C}_{i}, \; i = 1, 2, \ldots n \\ \end{aligned}$$

Namely we're looking for orthogonal projection of $ \boldsymbol{y} $ onto the intersection of the sets $ {\left\{ \mathcal{C}_{i} \right\}}_{i = 1}^{n} $.

The problem could be rewritten as:

$$\begin{aligned} \arg \min_{ \boldsymbol{x} } \quad & \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} + \sum_{i}^{n} {\delta}_{\mathcal{C}_{i}} \left( \boldsymbol{x} \right) \end{aligned}$$

Where $ {\delta}_{\mathcal{C}_{i}} (x) = \begin{cases} 0 & x \in \mathcal{C}_{i} \\ \infty & x \notin \mathcal{C}_{i} \end{cases} $.

So we can use the ADMM Consensus Optimization by setting $ {f}_{1} \left( \boldsymbol{x} \right) = \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} $ and $ {f}_{i} \left( \boldsymbol{x} \right) = {\delta}_{\mathcal{C}_{i - 1}} \left( \boldsymbol{x} \right), \; i = 2, 3, \ldots, n + 1 $.

In order to solve it we need the Proximal Operator of each function. For $ {f}_{1} \left( \cdot \right) $ it is given by:

$$ \operatorname{Prox}_{ \frac{1}{\rho} {f}_{1} \left( \cdot \right)} \left( \boldsymbol{v} \right) = \arg \min_{ \boldsymbol{x} } \frac{1}{2} {\left\| \boldsymbol{x} - \boldsymbol{y} \right\|}_{2}^{2} + \frac{\rho}{2} {\left\| \boldsymbol{x} - \boldsymbol{v} \right\|}_{2}^{2} = \frac{\rho \boldsymbol{v} + \boldsymbol{y}}{1 + \rho} $$

For the rest the Proximal Operator is the orthogonal projection:

$$ \operatorname{Prox}_{ \frac{1}{\rho} {f}_{i} \left( \cdot \right)} \left( \boldsymbol{v} \right) = \operatorname{Proj}_{ \mathcal{C}_{i - 1} } \left( \boldsymbol{v} \right), \; i = 2, 3, \ldots, n + 1 $$