I am starting with a typical $\ell_1$ basis pursuit problem:
$$ \min_{\mathbf{x}} \Vert \mathbf{x} \Vert_1 \quad \mathrm{s.t.} \quad \Vert \mathbf{ERx} - \mathbf{y} \Vert_2 \leq \epsilon, $$
where $\mathbf{R}\in\mathbb{C}^{M \times P}$, and $\mathbf{E}$ is an $M' \times M$ row selection matrix that randomly chooses $M'<P$ rows from $\mathbf{R}$. It is also known that $\operatorname{rank}(\mathbf{ER})=M'$, so that the constraint system is full-rank but under-determined.
Since both $M$ and $P$ are on the order of $10^7$, the above problem is impractical because of memory and computational limitations. An algorithm exists that will let me efficiently compute $\mathbf{R^{\mathrm{H}}}\mathbf{E^{\mathrm{H}}}\mathbf{y}$, so I would like to see if I can combine this with a sparse approximation (described below) to solve the following equivalent problem:
$$ \min_{\mathbf{x}} \Vert \mathbf{x} \Vert_1 \quad \mathrm{s.t.} \quad \Vert \mathbf{R^{\mathrm{H}}}\mathbf{E^{\mathrm{H}}}\mathbf{ERx} - \mathbf{R^{\mathrm{H}}}\mathbf{E^{\mathrm{H}}}\mathbf{y} \Vert_2 \leq \epsilon, $$
Since $\mathbf{E}$ is a row-selection matrix, the product $\mathbf{E^{\mathrm{H}}}\mathbf{E}$ is a diagonal matrix with a $1$ at element $(i,i)$ if row $i$ was selected and $0$ otherwise. Furthermore, given the nature of $\mathbf{R}$, even though the product $\mathbf{R^{\mathrm{H}}}\mathbf{E^{\mathrm{H}}}\mathbf{ER}$ is rather large at $P \times P$, it is also approximately sparse with only a relatively small number of terms with significant modulus per row/column. This makes me think that the above form is doable since I could compute just the sparse approximation and avoid the unrealistic memory and computation requirements.
However, $\mathbf{R^{\mathrm{H}}}\mathbf{E^{\mathrm{H}}}\mathbf{ER}$ has size $P \times P$ but is only of rank $M'$ and this is causing problems with the basis pursuit algorithms as they expect the rows of the constraint matrix to be linearly independent. Hence, my solicitation for input.
Are there any factorizations and/or transformations that could turn the constraints of the second form of the problem into under-determined yet full-rank system? Please do keep in mind the large sizes of these matrices. Any other suggestions/comments are also welcome. We can assume that $\mathbf{R}$ has rank $M$ or $P$, as there is freedom in its design.
Edit:
Below is the Matlab code used to implement the Pock-Chambolle BPDN solver described in the answer:
function [ x1, res, ii ] = pockchambolle( Rin, y, rho, tol, ...
tau, sigma, maxiter )
% pockchambolle - Pock-Chambolle Basis Pursuit De-Noising
%
% SYNOPSIS
% [ x1, res, ii ] = pockchambolle( Rin, y, rho, tol, ...
% tau, sigma, maxiter )
%
% DESCRIPTION
% Solves the BPDN problem using the Pock-Chambolle algorithm:
%
% min ||x||_1 s.t. || R*x - y ||_2 < rho
%
% INPUT
% Rin Either a 2-D MxN array or a function handle that
% will implement R*x. See NOTES for interface.
% y Mx1 constraint vector.
% rho
% tol Resdual change level used to terminate iterations.
% tau
% sigma
% maxiter Maximum number of iterations allowed.
%
% OUTPUT
% x1 Nx1 vector of the estimated BPDN solution.
% res Residual change on last iteration.
% ii Number of iterations used to find answer.
%
% NOTES
% The format of the function handle is the same as for
% SPGL1. That is, Rin(x,adj), where Rin(x,1) will return
% R * x, and Rin(x,2) will return R'*x.
if( ~exist( 'maxiter', 'var' ) || isempty( maxiter ) )
maxiter = 100;
end
% If the input is not already a function handle, we make
% it one so that the code is the same for both cases.
if( ~isa( Rin, 'function_handle' ) )
R = @(x,adj) Rop( Rin, x, adj );
else
R = Rin;
end
% Set the initial conditions.
v0 = y;
x0 = R( v0, 2 );
xbar0 = x0;
% Set the default termination condition.
if( ~exist( 'tol', 'var' ) || isempty( tol ) )
tol = 1e-4;
end
for ii = 1 : maxiter
% The Pock-Chambolle interation.
u = v0 + sigma * ( R( xbar0, 1 ) );
v1 = u - sigma * l2proj( u / sigma, rho, y );
x1 = l1prox( x0 - tau * ( R( v1, 2 ) ), tau );
xbar1 = 2 * x1 - x0;
% Check the residual change to see if we can terminate.
res = max( abs( x0(:) - x1(:) ) );
if( res <= tol )
break;
end
% Copy the buffers and repeat.
v0 = v1;
x0 = x1;
xbar0 = xbar1;
end
end
% Dummy function to implement matrix multiplication.
function y = Rop( R, x, adj )
if( adj ~= 2 )
y = R * x;
else
y = R' * x;
end
end
The $\ell_2$-ball projection
function y = l2proj( x, r, b )
% l2proj - Projection onto an arbitrary l2 ball
%
% SYNOPSIS
% y = l2proj( x, r, b )
%
% DESCRIPTION
% Implements the projection onto the convex set
% C = { x | || x - b ||_2 <= r }.
%
% INPUT
% x Nx1 input array to project onto the ball.
% r Scalar radius of the ball.
% b Nx1 array of the ball's center.
%
% OUTPUT
% y Nx1 array of the the input's projection.
if( ~exist( 'b', 'var' ) || isempty( b ) )
b = 0;
else
if( ~isequal( size( x ), size( b ) ) )
error( 'Input args ''x'' and ''b'' must be same size.' );
end
end
% Vector from input to the center of the ball and
% it's distance.
z = b - x;
d = norm( z(:) );
% If the distance is less than the radius of the ball,
% then there is nothing to do. Otherwise, we simply
% move the input to the boundary of the ball.
if( d <= r )
y = x;
else
y = x + ( d - r ) * z ./ d;
end
return;
The $\ell_1$ proximal mapping function:
function y = l1prox( x, tau )
% l1prox - Proximal mapping for l1 norm
%
% SYNOPSIS
% y = l1prox( x, tau )
%
% DESCRIPTION
% The proximal mapping operator for the l1 norm
% This is equivalent to a soft thresholding.
%
% INPUT
% x Input array to which to apply the proximal.
% tau Proximal mapping scale factor.
%
% OUTPUT
% y Proximal mapping of input array.
tau = abs( tau );
xmod = abs( x );
zinds = xmod <= tau;
y = x - tau * x ./ xmod;
y(zinds) = 0.0;
return;
SPGL1 appears (from a quick glance at the documentation) to solve a constrained least squares problem at each iteration, which I think is too expensive for a very large problem. It might be worth trying out your own implementation of Pock-Chambolle, which could be written in like 10 or 20 lines of Matlab.
What you're calling $y$, let's call $b$ instead. ($y$ is used for something else in the Pock-Chambolle paper.)
Let $C = \{v \mid \|v - b \|_2 \leq \epsilon \}$ and let $\delta_C$ be the indicator function of $C$. So $\delta_C(u) = \begin{cases} 0 \quad \text{if } u \in C\\ \infty \quad \text{otherwise.}\end{cases}$
Your problem has the form minimize $F(Kx) + G(x)$ where $K = E R$ and \begin{equation} G(x) = \| x \|_1 \end{equation} and \begin{equation} F(u) = \delta_C(u). \end{equation}
This is the problem form Pock-Chambolle uses.
Select step sizes $\sigma$ and $\tau$. (They must satisfy $\tau \sigma \|K\|^2\leq 1$.) Let $\theta = 1$. The Pock-Chambolle iteration is: \begin{align} y^{n+1} &= \text{prox}_{\sigma F^*}(y^n + \sigma K \bar{x}^n) \\ x^{n+1} &= \text{prox}_{\tau G}(x^n - \tau K^* y^{n+1}) \\ \bar{x}^{n+1} &= x^{n+1} + \theta ( x^{n+1} - x^n). \end{align} Here $F^*$ is the "conjugate" of $F$.
Short formulas for the "prox operators" appearing in this iteration are given in Vandenberghe's 236c notes. The prox operator of the $1$-norm ($\text{prox}_{\tau G}$) takes an input vector $w$ and simply "shrinks" each component of $w$ by $\tau$. In other words, each component of $w$ is moved towards the origin by a distance $\tau$. (If you hit the origin then you stop, and that component is set to $0$.)
The prox operator of $F^*$ can be computed in terms of the prox operator of $F$ using the Moreau decomposition formula. And the prox operator of $F$ simply projects onto $C$.
Edit: If you had $\| Ax \|_1$ instead of $\|x\|_1$ in the objective, you could write your problem as minimizing $F(Kx) + G(x)$ where now $G(x) = 0$, $Kx = (Ax,ERx)$ (we are thinking of $K$ as a linear transformation), and
\begin{equation} F(y_1,y_2) = \|y_1\|_1 + \delta_C(y_2). \end{equation} With the problem expressed in this way, we can now use Pock-Chambolle. Evaluating the prox operator of $F$ separates into two independent problems.
Using block notation, \begin{equation} K = \begin{bmatrix} A \\ ER \end{bmatrix} \text{ and } Kx = \begin{bmatrix} Ax \\ ERx \end{bmatrix}. \end{equation}
So \begin{equation} K^* = \begin{bmatrix} A^* & (ER)^* \end{bmatrix} \text{ and } K^* \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = A^* y_1 + (ER)^* y_2. \end{equation}