For 2 months I have been studying sparse recovery and its applications for image classification and I have found that it's a broad area in mathematics which gives rise to a wide variety of regularization techniques. However, I am struggling to understand and implement the sparse recovery of the sparse vector by $L_1$ shrinkage iteration based on prior knowledge of labels.
Actually, I have a dictionary size $200 \times 58000$ (fat matrix) which constructed by a prior knowledge of classes (which needs to be redundant), and I have the knowledge of labels and I try to find the $L^1$ optimal solution using soft-shrinkage iteration which is as follows:
We have the linear combination. $$y = Dx$$ and we use the least squares method to minimize $x$
$$\|y - Dx\|^2 + \lambda\|x\|_1\to\min_x$$
My question is, how to compute the $x$ first and then create an iterative method to update its quantity?
Or should I initialize $x$ to e.g $0$ and then wrap it with an iterative method to be updated? if so, how to formulate that particular iterative method?
Appreciate your help and response.
Edited
Is the following equation the derivation of least squire with $L_1$? How can I end up to the following equation?
Note: Of course the superscripts denote the number of iterations.
$$x^{n+1} = S\lambda\ /2(x^n + D^T(y - D x^n)) $$
You may solve the minimisation problem
$$\|y-Dx\|+\lambda\|x\|_1\to\min_x$$
using the so-called Fast Iterative Shrinkage Thresholding Algorithm (FISTA).
It's an iterative method which essentially consists of setting of setting the step size $L$ as the largest eigenvalue of $D^\ast D$, the initial values as $z_1=x_0$, $t_1=1$ and iterating
$$z_k=\operatorname{prox}_{L\frac{1}{\lambda}\|\cdot\|_1}(y-D^\ast(y-Dx_k)),$$ $$t_{k+1}=\frac{1+\sqrt{1+4t^2_k}}{2},$$ $$x_{k+1}=z_k+\left(\frac{t_k-1}{t_{k-1}}\right)(z_k-z_{k-1}).$$
Note that the nice property of the $\|\cdot\|_1$ regularisation term is that it has a closed form proximal mapping operator:
$$\operatorname{prox}_{\alpha\|\cdot\|_1}(x)=\operatorname{sign}(x_i)\max\{|x_i|-\alpha,0\},$$ which is just the soft thresholding operator.
You can read more about FISTA here: FISTA