Solve Linear Least Squares Problem with Unit Simplex Constraint

1.1k Views Asked by At

$$ \min_x ||Ax - b||_2\; \;\text{given }x \geq 0\;\;\text{and}\;\;\textbf{1}^Tx = 1 $$ I am trying to do the above optimization, I was using common Quadratic programming libraries but their speed is too less. I believe this problem needs much less general optimization routine. I was able to find non-negative least squares optimizations but they didn't offer any linear constrains. I read in few articles online that the dimensionality of the problem can be reduces by considering $x_n = 1- \sum_{i = 0}^{n-1}x_i$, and can be optimized using non-negative least squares optimization (shouldn't we in such cases constrain $\sum x_i$ to be less than 1 ?)

Thanks :)

Edit: I am really sorry, I have changed the > to >= condition.

3

There are 3 best solutions below

1
On

The problem is known as a standard quadratic program (StQP, see pg. 3 for details). If you relax the constraint on the sum of the entries to a sum of the absolute value of the entries, and the strict inequality to non-strict, one can show by duality that this problem has an $\ell_{1}$ minimization interpretation which is quickly solved for. The un-relaxed problem is a bit more complex to solve.

Generally the solution to the problems have a lot of zero entries when $x>0$ is replaced with $x\geq 0$ so the relaxation may not have the properties you want.

5
On

If the problem is very large then a proximal algorithm such as FISTA could be a good choice. You can formulate the problem as minimizing $f(x) + g(x)$ where $$ f(x) = \frac12 \|Ax-b\|_2^2 $$ and $g$ is the convex indicator function of the probability simplex. This is the correct problem form for the proximal gradient method and for accelerated proximal gradient methods such as FISTA. You will need to be able to compute the gradient of $f$ and the prox-operator of $g$. The gradient of $f$ is $$ \nabla f(x) = A^T (Ax-b). $$ The proximal operator of $g$ performs a projection onto the probability simplex. Methods for performing this projection efficiently can be found by googling.

You can learn more about proximal algorithms from Boyd's booklet called Proximal Algorithms (among other resources).

0
On

The problem is given by:

$$ \begin{alignat*}{3} \arg \min_{x} & \quad & \frac{1}{2} \left\| A x - b \right\|_{2}^{2} \\ \text{subject to} & \quad & x \succeq 0 \\ & \quad & \boldsymbol{1}^{T} x = 1 \end{alignat*} $$

Which I solved in my answer to How to Project onto the Unit Simplex as Intersection of Two Sets (Optimizing a Convex Function)?

I tried to write code in order to extend what I did there.
So I wrote a MATLAB Code which is accessible in my StackExchange Mathematics Q2935650 GitHub Repository.

The solvers I implemented / used are as following:

  1. CVX as a reference.
  2. Projected Gradient Descent with Projection onto the Unit Simplex as I implemented in Orthogonal Projection onto the Unit Simplex.
  3. Projected Gradient Descent with Projection onto the Unit Simplex implemented as Alternating Projections for the projection of the intersection of 2 convex sets. See Projections onto Convex Sets and How to Project onto the Unit Simplex as Intersection of Two Sets (Optimizing a Convex Function).
  4. Conditional Gradient Method (Frank Wolfe Algorithm).

The results are given by:

enter image description here

Timing for handling your case of $ A \in \mathbb{R}^{1500 \times 500} $ is less than a second in either of the methods I implemented.

The Frank Wolfe method is simple for this case. Indeed the Set is compact hence convergence is guaranteed. Since the convex hull of the Unit Simplex is given by Standard Basis the optimization is simple:

$$ \arg \min_{s \in \Delta} \nabla f \left( x \right)^{T} s = \arg \min_{i} {\left( \nabla f \left( x \right) \right)}_{i} $$

Where $ f \left( x \right) = \frac{1}{2} {\left\| A x - b \right\|}_{2}^{2} $ and $ \Delta = \left\{ x \mid \boldsymbol{1}^{T} x = 1, \, x \succeq 0 \right\} $ is the Unit Simplex Set.
So the solution is basically the element of the gradient with the minimal value.

This method becomes really simple and very fast for your case.

All methods could be even faster by: