I am looking to learn more into compressed sensing algorithms (especially parallelizable ones). I am familiar with some formulations of the problem (minimize L2 norm of recovered data against original subsamples at the subsampling instances, and L1 norm of the transformed sparse representation of the recovered signal simultaneously). But I am unable to figure out the algorithms, and formal proofs for the algorithms to make the iterations, optimal initial guess points (should I initialize with filling the valid points and letting rest of entries be 0, or randomize the rest of the entries based on some statistical parameters, or start without any initial guess). I have taken a course on Convex optimization, and while it was really in depth, I am still exploring more into this area.
The L2 minimization can be done using gradient descent. For the L1 minimization, it would depend on the basis vectors to choose on an algorithm like Fourier Transform, Discrete Cosine Transform or Wavelet transform. I am unable to understand how the algorithm for this minimization works (L1 magic, LASSO, etc). On the same note, is there any way to obtain the basis functions in which the signal is sparse in, given some initial understanding of the signal? I am interested to find sparse basis for my data (as a starting point, I have some idea of the physics of the system and possible errors), and not sure where to begin. This looks like a machine learning problem (Dictionary Learning), but not sure which particular area to focus on. I do feel that there may be a wavelet basis that my signal would be fairly sparse in but no idea how to get to that point.
Would love to know resources and guidance on the above.