Assume we have $N$ measurements $z_1, ..., z_N \in \mathbb{R}^{n_z} $ that generated by
$$ z_i = M v_i + e_i $$
where $v_i \in \mathbb{R}^{n_v}$, $n_v < n_z$ and $e_i$ an error sampled from normal distribution $\mathcal{N}(0, \sigma^2 I) $. Hereby $M$ should have full rank but be unknown.
Given only the measurements $z_i$ and no information about $M$, $v_i$ and $e_i$, my task is to find a matrix $B$ that estimates
$$ \hat v_i \approx B \hat z_i $$
for new samples $(\hat v_i, \hat z_i)$ generated by same distributions as before. This is in general referred to as blind signal separation (BSS).
Now my question is the following: I dont know the distribution of $v_i$ at all, but i have the information, that the entries $v_{i,j}$ will always be around $0$ for all indices $j=1,...,n_v$ except a small number of indices, up to k many (its around $k/n_v \leq 0.2$ ...). So $v_i$ is always "approximately sparse".
So is there a good way to use this information to improve certain BSS algorithms to deliver a $B$ which will more probably deliver solutions $\hat v_i$ which tend to have the above sparsity property? Or are there certain algorithms for BSS that are especially well suited for this special case? Is there maybe a clever way to reformulate the BSS problem to contain the sparsity property?
To explain a bit about my application ... I want to build a software which is filled with piano music and should extract the notes being played. Herefor, i take several mp3s and genrate samples $z_i$ from wavelet transforming the sound signal at different time stamps. So a $z_i$ is basically the vector of amplitudes to different frequencies at a certain time. Then the $v_i$ is the vector of the intensities of the different notes being played (obviously unknown resp. not practicable to look up). Keep in mind, that if u play one note $v_{i,j}=\delta_{ij}$ u get all its harmonic frequencies in $z_i$ which makes it difficult to filter out.