Bases in compressed sensing (signal reconstruction)

1.5k Views Asked by At

I have been posting this kind of question in Cross Validated, but since this one deals almost entirely with mathematics, I will post it here.

In signal reconstruction using compressed sensing, we want to sample a signal $f$ in order to obtain a smaller (compressed) signal $b$:

$$b = \phi f$$

However, $f$ can also be expressed by a linear combination of basis functions $\Psi$ and its coefficients $c$:

$$f = \Psi c$$

So the first equation and second equation together produce:

$$b = \phi \Psi c$$

Furthermore, we also want to promote the sparsity of $c$. This is done by solving the following optimization problem:

$$\text{min } ||c||_{l_{1}} \text{ subject to } b = \phi \Psi c$$

where $||c||_{l_{1}}$ is the $l_{1}$-norm.

I think I understand this part but I'd like to know more about the bases $\Psi$. How are they chosen? How are they implemented in an algorithm? Right now I'm trying to understand a couple of examples that use the following $\Psi$ and $\Phi$:

1.

D=dct(eye(n,n)); % \Psi
A=D(perm,:); % \Phi \Psi

where dct is the discrete cosine transform, n is the dimension of the original signal and perm are the first numbers of a list of randomly generated numbers:

r1 = permutation(arange(1, n))
perm = r1[0:m]

2.

Atest = zeros((nx, ny)).reshape(1, nx*ny)
Adelta = zeros((k, nx*ny))
for i, j in enumerate(r1k):
    Atest[0,j] = 1
    Adelta[i, :] = dct(Atest)
    Atest[0, j] = 0

here nx and ny are the dimensions of the original signal, r1k is also permutation (similar to perm). Adelta is produced by choosing a point in a matrix Atest, transform it using dct and adding the result as a row in matrix Atest.

In these pieces of code, I know A and Adelta represent $\Phi \Psi$ but I don't really understand why. I'd appreciate a few comments about this.

By the way, if you want to take a look at the full example, here is one using MATLAB and this one in Python. This example is also related to my other question in Cross Validated, so if you want to contribute to that also, you are more than welcome, although it deals with an implementation issue which may not be interesting enough.

UPDATE:

An additional question: How should we deal with the actual reconstruction. According to the second equations $f = \Psi c$, once we know $c$, an approximation of $f$ is found simply by calculating $\Psi c$. However, what I see in practice is actually the direct application of a DCT to the coefficients $c$. Why?

Thanks

3

There are 3 best solutions below

7
On

I will not enter in the details of your code (which is not as readable as it should be), i will rather give you an abstract description: $\phi$ is the matrix that represents the way you sample your signal $f$. For example, if you are doing subsampling, i.e. you just sample a subset of the entries of $f$, then $\phi$ is the identity matrix with rows removed. On the other hand $\Psi$ represents the basis that you choose to expand your signal $f$. For example, if you know that $f$ is a time series containing only a couple of frequencies, then you can take $\Psi$ to be the Discrete Fourier (or discrete cosine) matrix and then you try to solve for $c$. Note that $c$ will be sparse, with nonzero entries associated to the distinct frequencies.

7
On

To take case 1, you know that $A=\Phi\Psi$. The DCT matrix, $D$, is $\Psi$. The value of $\Phi$ is implied by the random selection of rows in the line A = D(perm,:);. This is equivalent to multiplying D by a row-selection matrix: A = P * D. To give a specific example, say that D is a 5x5 matrix and you want to make the 2x5 matrix A that is made up of the rows 4 and 2 of D. Then

$$ P= \left[ \begin{array}{ccccc} 0 & 0 & 0 & 1 & 0\\ 0 & 1 & 0 & 0 & 0\\ \end{array} \right] $$

The following are good background material on the theoretical aspects of the problem:

Uncertainty Principles and Signal Recovery

Optimally Sparse Representation in General Nonorthogonal Dictionaries

Sparsity and Incoherence in Compressive Sampling

Decoding by Linear Programming

Uncertainty Principles and Ideal Atomic Decomposition

0
On

You say "but I'd like to know more about the bases $\Psi$. How are they chosen? How are they implemented in an algorithm? "

$\Psi$ is chosen so that $c$ is sparse. If $\Psi$ is equal to the identity, then the signal is sparse in the canonical basis.

Let us imagine the signal is sparse in the canonical basis (aka $\Psi =\mathbf{I}$) then the difference between a simple sampling problem and compressive sensing problem is the difference between $\Phi$ being the identity and an acceptable compressive sensing measurement matrix.

How is it implemented in algorithm? $\Psi$ is either a matrix made up of columns of the discretized version of each function of the basis or just a subroutine.