Convex optimization of a convolution problem

82 Views Asked by At

I want to optimize a sparse problem using discrete math of the following form:

S [16500x1]: measured data vector

D [1500x2]: a dictionary matrix containing two basis signals of 1500 samples

X [16500x2]: the impulse responses of the problem that need to be found.

The signal generation has the following structure: $$S[n] = \sum_i D_i[n] * X_i[n] + \eta$$ So basically, the signal S[n] is the sum of discrete convolutions of $D_i$ and $X_i$.

To reconstruct the impulse responses $X_i$ from a measurement of $S$ I use a sparse, convex optimization problem with following loss function: $$ \epsilon = |\sum_i D_i * X_i - S |^2_2 + \lambda \cdot |X|_1^1$$

I want to solve this using simple gradient descent, so I calculate the gradient of $\epsilon$:

$$\frac{\partial \epsilon[n]}{\partial X} = 2\cdot (\sum_i D_i * X_i - S) * D_i + \lambda\cdot sign(X)$$

However, when I implement this, the loss diverges, even with very small step sizes. Only when I flip the second $D_i$ used in the second convolution (in the time dimension), I get the results I am expecting. The idea for reversing came from an implementation I saw of this in the frequency domain, where the hermetian transpose was used in the multiplication (which equates to time reversal in time domain).

So a long introduction for the question: why do I need to time-reverse the second $D_i$ (right before the plus sign)? I do not see where this is comming from, based on my knowledge on chain rules. Any help would be greatly appreciated in understanding this!

Thanks! PS. Sorry for sloppy notation. Not a mathematician.

EDIT: added the code:

gradientVecs = zeros(size(X));
maxIter = 1000;
for cntIter = 1 : maxIter
    curDX = 0;
    for cntDict = 1 : size( D, 2 )
        curDX = curDX + conv( X( :, cntDict ), (D(:, cntDict)),  'same' );
    end
    curLoss = sum( sum( ( curDX - S ).^2 ) ) + lambda * sum( sum( abs( X ) ) );

    for cntDict = 1:size(D, 2)
        % Why is there a flip here????                   HERE???     
        gradientVecs(:, cntDict) = 2 * conv( (curDX - S), flip(D(:, cntDict)), 'same') + lambda * sign (X(:, cntDict)); 
    end

    X = X - 0.02*( gradientVecs ) ;

end
1

There are 1 best solutions below

3
On

OK, I found the answer to my own question:

First, let us recap the problem we want to find the gradient for:

$$\epsilon = | \sum_i d_i*x_i |_2^2 + \lambda \cdot \sum_i |x_i|_1$$

We can rewrite the discrete convolution operator in matrix form: $$\sum_i d_i*x_i = \sum_i D_i \cdot x_i$$ where $D_i$ is a matrix of which the columns are shifted versions of $d_i$. $D_i$ therefore is a Toeplitz matrix.

then, we can rewrite the loss function: $$\epsilon = | \sum_i D_i \cdot x_i |_2^2 + \lambda \cdot \sum_i |x_i|_1$$

We can then write out the $L_2$ norm: $$ \epsilon = ( \sum_i D_i \cdot x_i )^T\cdot( \sum_i D_i \cdot x_i ) + \lambda \cdot \sum_i |x_i|_1$$

Which simplifies to: $$ \epsilon = \sum_i D_i^T x_i^T x_i D_i - 2s^T\sum_i D_i x_i + s^Ts + \lambda \cdot \sum_i |x_i|_1$$

We can then take the gradient w.r.t $x_i$, dropping the sum over $i$: $$ \frac{\partial \epsilon}{\partial x_i} = 2D^T_iD_ix_i - 2D_i^Ts + \lambda \cdot sign(x_i)$$

$$ \frac{\partial \epsilon}{\partial x_i} = 2\cdot D^T \cdot(D_i x_i -s) + \lambda \cdot sign(x_i)$$

The multiplication with $D^T$ before the $(D_i x_i -s)$ equals to a convolution with the time inverse of $d_i$ because of the Toeplitz structure of $D_i$