Solve Linear Least Squares with Squared $ {L}_{2} $ Norm Regularization (Tikhonov / Ridge Regression) with Non Negativity Constraint Using FASTA

495 Views Asked by At

The open source optimization library FASTA (Fast Adaptive Shrinkage/Thresholding Algorithm) is an efficient, easy-to-use implementation of the proximal gradient method for regularized optimization problems. Embeded solvers could solve LASSO, Ridge Regression and Matrix Complementry problems.

I want to solve the following problem using FASTA: $$\arg \min_x \{||Ax-b||^2+\lambda||x||^2 \} \\ s.t.\quad x\ge 0$$

Acordding to the follwing LASSO solver(in MATLAB):

%%  Define ingredients for FASTA
%  Note: LASSO solver solves min f(Ax)+g(x). 
%  A and b are given.
%  f(z) = .5 ||z - b||^2
f    = @(z) .5*norm(z-b,'fro')^2;
grad = @(z) z-b;
% g(z) = 0 for all feasible z, and infinity otherwise.  However, iterations
% always end with a feasbile z, so we need not consider the infinity case.
g = @(x) 0;
% proxg(z,t) = argmin t*mu*|x|+.5||x-z||^2
prox = @(z,t) project_1ball(z,mu);

%% Call solver
[solution, outs] = fasta(A,At,f,grad,g,prox,x0,opts);

I write the following nonnegative constraned Ridge Regression Solver:

%%  Define ingredients for FASTA
%  Note: fasta solves min f(Ax)+g(x).
%  f(z) = .5 ||z - b||^2
f    = @(z) .5*norm(z-b,'fro')^2;
grad = @(z) z-b;

g = @(x) 0.5*mu*norm(x,'fro')^2;
prox = @(x,t) mapping(x,t*mu);

%% Call solver
[solution, outs] = fasta(A,At,f,grad,g,prox,x0,opts);

%% Embeded function
function x = mapping(x,tau)
    x = x+tau;
    x(x<0) = 0;  % for non-negative constraint
end

But it did not converge. How to solve the NON-NEGATIVE constrained ridge regression problem using FASTA?

2

There are 2 best solutions below

0
On BEST ANSWER

The FASTA package basically uses some heuristic to better utilize acceleration, momentum methods and line search methods.

Their paper, A Field Guide to Forward Backward Splitting with a FASTA Implementation, describes the method. For background I'd also recommend reading:

I implemented 4 methods for your problem to benchmark different approaches:

  • Vanilla Projected Gradient Descent.
  • Projected Gradient Descent with Momentum.
  • Projected Gradient Descent Accelerated (Nesterov).
  • Projected Gradient Descent Accelerated (FISTA).

Those are the results I got with comparison to a reference given by CVX:

enter image description here

The MATLAB code is available at my StackExchange Mathematics Q3872982 GitHub Repository.

0
On

Let's say you want to minimize $\frac12 \| Ax - b \|^2 + \frac{\lambda}{2} \| x\|^2$ subject to the constraint that $x \geq 0$ using the proximal gradient method or an accelerated proximal gradient method. The simplest way to do this is to express your problem as minimizing $f(x) + g(x)$, where $f(x) = \frac12 \| Ax - b \|^2 + \frac{\lambda}{2} \| x\|^2$ and $g$ is the indicator function for the nonnegative orthant. The gradient of $f$ is $$ \nabla f(x) = A^T(Ax - b) + \lambda x. $$ The proximal operator of $g$ projects onto the nonnegative orthant: $\text{prox}_{tg}(x) = \max(x,0)$. (The maximum is interpreted component-wise.)

An alternative approach is to express your problem as minimizing $F(x) + G(x)$, where $F(x) = \frac12 \| Ax - b \|^2$ and $G(x) = \frac{\lambda}{2}\| x \|^2 + I_{\geq 0}(x)$. Here $I_{\geq 0}$ is the indicator function for the nonnegative orthant. The gradient of $F$ is $\nabla F(x) = A^T (Ax - b)$. To evaluate $\text{prox}_{tG}(\hat x)$, we must minimize $\frac{\lambda}{2} \| x \|^2 + \frac{1}{2t} \| x - \hat x \|^2$ subject to the constraint that $x \geq 0$. Completing the square, this is equivalent to minimizing $\| x - \frac{\hat x}{\lambda t + 1} \|^2$ subject to the constraint that $x \geq 0$. The minimizer is $$ \text{prox}_{tG}(\hat x) = \max \left( \frac{\hat x}{\lambda t + 1} , 0 \right). $$ Note that this disagrees with the proximal mapping given in your code, which is probably the reason the algorithm failed to converge.