From what I understand, sparse recovery is about:
Ax = y
I know A, I know y, I want to find a representation of x that is a possible solution.
If x >> y, this can't be solved exactly and sparsity is a good heuristic to get a solution.
What I want to do is, find that x in my case. For an equation, I want to find x. So, I set up my code: A = (300, 4K) in dimensions, y = (300,) in dimensions, x = (4K, ) in dimensions. Then, I tried Orthogonal matching pursuit algorithm to recover x.
from sklearn.linear_model import OrthogonalMatchingPursuit
omp = OrthogonalMatchingPursuit(n_nonzero_coefs=len(x))
omp.fit(A, y)
coef = omp.coef_
This code doesn't work as it says length of A and y are not consistent.
So, my question is:
Is my formulation correct? Does sparse recovery do what I am trying to do here, i.e. get the x vector, given A and y?
What algorithm should I use, given I have very large dimensional matrix?
Why does this OMP algorithm requires number of samples? Is it a machine learning algorithm, am I suppose to provide thousands of A, y pairs to get a solution?
It's not just about solving $Ax = y$. It's about choosing, of the assumed infinite solutions $x$, the one with the most zero-elements (i.e. the most sparse solution). This is just one of many methods of regularization, and using it specifically is a matter of application. (What are you solving? A machine learning problem? Optimal control? Compressed sensing? They all have different reasons for regularizing by minimizing sparsity). Also, seeking out the sparse solution is not really a "heuristic" - it is just a desire. The heuristics lie in some of the methods to achieve that desire (for example L1-regularization, which heuristically achieves sparsity very well).
It sounds like you just want a solution but don't seem to actually care if it's sparse or not. A quick and easy way to do that is to regularize by "least-squares", i.e. pick the solution that has the minimum sum of squares of its elements. This is easier because it has a nice closed-form solution that is implemented in numpy:
x=numpy.linalg.pinv(A).dot(y).(Depending on the situation, iterative methods can be even faster than the pseudoinverse. There is a lot of literature on linear least-squares, so I'll let you read up on that. Note that solving under-(and over)-determined linear systems is so common that many linear algebra packages just abstract the process away entirely, like in matlab where you would just type
x=A\y). (And really, if you just want any solution $x$ you could just arbitrarily set a bunch of the elements of $x$ to whatever you want until the resulting sub-problem is uniquely determined. But such a method would probably not go over well for the true application at hand here).