Gradient of Predictive Sparse Decomposition Cost function

342 Views Asked by At

I am trying to minimize the following Cost function with respect to $X_m$.

$$ Energy = f(X) = \frac{1}{2}||I-\sum_{m=1}^{M}{C_m * X_m}||_2^2+\sum_{m=1}^{M}{||X_m-\phi(W_m * I)||_2^2}+\lambda|X|_1 $$

$$ X_{min}=\arg{ \min_x{f(X_m)}} $$

with

$I:$ Input image (size: w x h)

$C_1 ... C_M:$ Decoder matrices (size: s x s)

$W_1 ... W_m:$ Encoder matrices (size: s x s)

$X_1 ... X_m:$ Sparse matrices (size: w+s-1 x h+s-1)

$C_m * X_m:$ The 2D convolution between $C_m$ and $X_m$. (In matlab: conv2(X,C,'valid'))

$W_m * I:$ The 2D convolution between $W_m$ and $I$. (In matlab: conv2(I,W,'full'))

$\phi(...):$ A (activation) function.

$||...||_2$: The L2-norm of a matrix.

$|...|_1:$ The L1-norm of a matrix.

I am trying to minimize the energy by using gradient descent algorithm:

$$ X_n=X_n-\nabla{f(X_n)} $$ where $$ \nabla{f(X_n)} = C_n^{'} * (I-\sum_{m=1}^M C_m * X_m) + (X_n - \phi(W_n * I)) $$

with

$C_n^{'} * z:$ Convolution of the $180^{\circ}$ rotation of $C_n$ and z (In matlab: conv2(z,rot90(C,2),'full'))

When running the algorithm, the Energy doesn't minimize, but becomes only larger. So I have 2 questions:

  1. Is the gradient $\nabla{f(X_n)}$ of the energy function correct?
  2. If the gradient is correct, what could be the problem that the energy doesn't minimize?
1

There are 1 best solutions below

0
On

It seems that the problem lies in the border of the $X$ matrices. Because I do a full convolution (conv2(I,W,'full')), the $X$ matrices gets extra values from the convolution of zero-padded borders. So these borders and in specific the corners are dependent on fewer values in the image. This will contribute to larger gradient in these locations. The next iteration of the algorithm will try to rectify the update of the larger gradient. This results in some kind of oscillating behavior, increasing the energy rather than minimizing it.

A solution is changing the full convolution (conv2(I,W,'full')) into a valid convolution. Or by using a mask that reduces the impact of the border in the gradient.