I'm going to solve this linear system:
$$Ax=b$$
Where $x$ is sparse. To do that, Inned to minimize this. This is called lasso regression.
$$\nabla J = ||Ax - b||_2 + \lambda ||x||_1$$
Where $\lambda > 0$. Notise that both $A$ and $b$ contains unsigned integer numbers of 8-bit, e.g 0...255.
To solve that $x$ we can use Sequential Threshold Least Squares.
function x = stls_regression(A, b, lambda)
x = linsolve2(A, b);
state_dimension = size(b, 2);
for k=1:10
smallinds = (abs(x)<lambda);
x(smallinds) = 0;
for ind = 1:state_dimension
biginds = ~smallinds(:,ind);
x(biginds,ind) = linsolve2(A(:,biginds),b(:,ind));
end
end
end
function x = linsolve2(A, b)
% Solve upper triangular
[A, b] = backsubstitution(A'*A, A'*b);
x = solveUp(A, b);
% Gauss
%x = gauss_jordan(A'*A, A'*b);
% Least squares
%x = inv(A'*A)*A'*b;
% LUP factorization
[L, U, P] = lu(A'*A);
y = L\(P*A'*b);
%x = U\y;
% QR factorization
[Q,R] = qr2(A);
%x = linsolve(R, Q'*b);
% Pseudo inverse
[U, S, V] = svd(A, 'econ');
%x = V*inv(S)*U'*b;
end
function [y] = gauss_jordan (K, L)
x = [K L];
for n = 1:(length(x)-1)
% Step 1: make the row N's Nth term 1 by dividing
% the whole row by it
A = x(n,:);
A = A/A(n);
x(n,:) = A;
% Step 2: for every other row add to it -1 * that rows Nth term *
% the Nth row
for k = 1:(length(x)-1)
if n~=k
x(k,:) = A*(-1*x(k,n)) + x(k,:);
end
end
end
y = x(:,length(x));
end
function [x] = solveUp(A, b)
s = length(A);
x = zeros(s,1);
x(s) = b(s)/A(s,s);
for i = s-1:-1:1
sum = 0;
for j = s:-1:i+1
sum = sum + A(i,j)*x(j);
end
x(i) = (b(i)- sum)/A(i,i);
end
end
function [A, b] = backsubstitution(A, b)
s = length(A);
for j = 1:(s-1)
for i = s:-1:j+1
m = A(i,j)/A(j,j);
A(i,:) = A(i,:) - m*A(j,:);
b(i) = b(i) - m*b(j);
end
end
end
function [Q,R] = qr2(A)
% Compute the QR decomposition of an m-by-n matrix A using% Householder transformations.
[m,n] = size(A);
Q = eye(m); % Orthogonal transform so farR = A;
R = A; % Transformed matrix so far
% This make sure that we can use A transpose as well
if(m <= n)
n = m;
end
for j = 1:n % -- Find H = I-tau*w*w’ to put zeros below R(j,j)
normx = norm(R(j:end,j));
s = -sign(R(j,j));
u1 = R(j,j) - s*normx;
w = R(j:end,j)/u1;
w(1) = 1;
tau = -s*u1/normx;
% Find Q and R now
R(j:end,:) = R(j:end,:)-(tau*w)*(w'*R(j:end,:));
Q(:,j:end) = Q(:,j:end)-(Q(:,j:end)*w)*(tau*w)';
end
end
For example, if you try to run this code.
lambda = 0.15;
A = [1 2 3 4 5 6 7 8 9 10; 4 1 5 6 2 2 5 6 2 10; 3 5 6 1 6 8 4 2 9 5; 2 4 5 6 2 1 7 9 6 4];
b = 1:4;
b = b';
x1 = linsolve(A, b)
x2 = stls_regression(A, b, lambda)
b1 = A*x1
b2 = A*x2
You will get different $x_1 \neq x_2$, due to $x_2$ is sparse. But $Ax_1 = Ax_2$
Question:
The function linsolve in MATLAB uses LUP-factorization if $A$ is square and QR-factorization if matrix $A$ is non-square. Those are very computationally expensive. But in this case, $A$ will be transformed to square $A^* = A'A$ and $b^* = A'b$ before I run Sequential Threshold Least Squares. The goal is to solve $A'Ax = A'b$.
What is the fastest way then for me to solve
$$\nabla J = ||A^*x - b^*||_2$$
If I only care about $x$ being sparse and precision is not so important for me? I have tried out
- Backsubstitution - Excellent
- Gauss-Jordan - Excellent, just inscrease lambda to 0.28
- Least squares - Did not work at all.
- LUP factorization - Excellent
- QR factorization - Excellent
- Pseudo inverse - Excellent
Is backsubsitution the best algorithm in this case?
The implementation of the $QR$ decomposition of your matrix $A$ will transform the least square problem as follow : \begin{aligned} \|b-A x\|_{2} &=\min _{y \in \mathbb{R}^{n}}\|b-A y\|_{2} \\ \|b-Q R x\|_{2} &=\min _{y \in \mathbb{R}^{n}}\|b-Q R y\|_{2} \\ \left\|Q^{T}(b-Q R x)\right\|_{2} &=\min _{y \in \mathbb{R}^{n}}\left\|Q^{T}(b-Q R y)\right\|_{2} \\ \left\|Q^{T} b-R x\right\|_{2} &=\min _{y \in \mathbb{R}^{n}}\left\|Q^{T} b-R y\right\|_{2} \\ \|c-R x\|_{2} &=\min _{y \in \mathbb{R}^{n}}\|c-R y\|_{2} \quad \text { where } c=Q^{T} b \end{aligned}
Assuming $m\geq n$, you can use MATLAB's built-in function $\mathsf{qr}$ and $\mathsf{svd}$ for a matrix $\mathsf{B=rand}(100)$ and $\mathsf{b=rand(100,1)}$ where $\mathsf{A=B+B'}$. You can use MATLAB's built-in function $\mathsf{timeit}$ to measure the run time of each version on $\|b-Ax\|_{2}$ and you'll conclude that $QR$ factorization is most reliable in terms of speed instead of $SVD$.
It is worth noting that the $LU$ decomposition accounts for $mn^{2}-n^{3}/3$ flops assuming $m\geq n$, this is about half as $QR$ decomposition which accounts for $2mn^{2}-2n^{3}/3$.
The choice of algorithm implemented in MATLAB is essentially based on how much structure your matrix has that can be exploited to achieve better performance. Sometimes you might not have a desirable symmetric matrix and thus if you use $LU$ your computation might be slow. However, it is faster than $QR$ since the $LU$ would account for $\frac{2}{3}m^{3}$ flops unlike $QR$ which would account for $\frac{4}{3}m^{3}$ flops.