Consider the following problem $$ J(v) = \frac{\lambda}{2}|| g - v ||_2^2 + \sum\limits_{i=1}^m\sum\limits_{j=1}^n \phi_\alpha((\delta_x^hv)_{i,j})+\phi_\alpha((\delta_y^hv)_{i,j}) $$ where $ g,v $ is a matrix of image of size $m\times n$ and the definition of $\phi_\alpha$ is below: $$ \phi_\alpha(t) = |t| - \alpha \log\left(1+\frac{|t|}{\alpha}\right) $$ and $\delta_x^h$ and $\delta_y^h$ is the gradient of image $v$. I need to calculate the gradient of $J$ to minimize $J$ using gradient descent to denoise a image.
What i've calculated is $$ \frac{\partial J}{\partial v} = \lambda(v-g) + \Bigg(\frac{(\delta_x^hv)_{i,j}(\delta_{xx}^hv)_{i,j}}{\alpha +|(\delta_x^hv)_{i,j}|} + \frac{(\delta_y^hv)_{i,j}(\delta_{yy}^hv)_{i,j}}{\alpha +|(\delta_y^hv)_{i,j}|}\Bigg)_{1\leq i \leq m, 1 \leq j \leq n} $$ where the $(\delta_{xx}^hv)_{i,j}$ and $(\delta_{yy}^hv)_{i,j}$ are the second order derivative of an image v.
But when i use this to do the gradient descent, the result is pretty bad. The image i got hasn't been denoised no matter how i change the value of number of iteration and the value of step. Can somebody point me out where i've made the mistake about the gradient of $J$? I've a hint that maybe the terms $\delta_{xx}^hv$ and $\delta_{yy}^hv$ might be wrong, but what's the gradient of gradient of image($\delta_x^hv$, $\delta_y^hv$, more specifically, $$\frac{\partial \delta_x^hv}{\partial v} \text{ and } \frac{\partial \delta_y^hv}{\partial v}$$ How can i calculate it?
Thanks.
Those image "gradients" are really convolutions, so let's denote them by $$\eqalign{ &A*V &= \delta^h_xV,\quad &&B*V &= \delta^h_yV \\ d(\!&A*V) &= A*dV,\quad &d(\!&B*V) &= B*dV \\ }$$ where $(*)$ is the convolution product, $V$ is the image and $(A,B)$ are the kernel matrices.
Given a matrix $X$, define the elementwise functions $$\eqalign{ S &= {\rm sign}(X) &\implies {\tt1} &= S\odot S \\ A &= |X| = S\odot X \quad&\implies X &= S\odot A \\ }$$ where $(\odot)$ denotes the elementwise/Hadamard product.
When the scalar function $\phi$ is applied elementwise to $X$ we can calculate its subdifferential as $$\eqalign{ \phi &= S\odot X - \alpha\log\left({\tt1}+\frac{S\odot X}{\alpha}\right) \\ d\phi &= S\odot dX - \frac{\alpha\,(S\odot dX)}{\alpha{\tt1}+S\odot X} \\ &= \left(S - \frac{\alpha S}{\alpha{\tt1}+S\odot X}\right)\odot dX \\ &= \left(\frac{S\odot S\odot X}{\alpha{\tt1}+S\odot X}\right)\odot dX \\ &= \left(\frac{X}{\alpha{\tt1}+|X|}\right)\odot dX \\ }$$ where $\Big(\frac{X}{Y}\Big)$ denotes elementwise/Hadamard division.
Applying this to one of the problematic terms. $$\eqalign{ {\cal J}_A &={\tt1}:\phi(A*V) \\ d{\cal J}_A &={\tt1}:\left(\frac{A*V}{\alpha{\tt1}+|A*V|}\right)\odot(A*dV)\\ &= \left(\frac{A*V}{\alpha{\tt1}+|A*V|}\right):(A*dV) \\ &= (JAJ)*\left(\frac{A*V}{\alpha{\tt1}+|A*V|}\right):dV \\ \frac{\partial{\cal J}_A}{\partial V} &= (JAJ)*\left(\frac{A*V}{\alpha{\tt1}+|A*V|}\right) \\ }$$ where a colon denotes the trace/Frobenius product, i.e. $\,M:N={\rm Tr}(M^TN)$
The Frobenius and Hadamard products commute, i.e. $\,A:B\odot C=A\odot B:C$
$J$ is the Exchange Matrix which is used to "flip" the kernel.
Finally, the full function can be dispatched as $$\eqalign{ {\cal J} &= \frac{\lambda}{2}\|V-G\|^2_F + {\cal J}_A + {\cal J}_B \\ \frac{\partial{\cal J}}{\partial V} &= \lambda(V-G) + \frac{\partial{\cal J}_A}{\partial V} + \frac{\partial{\cal J}_B}{\partial V} \\ }$$ This is very similar to the result that you obtained, but you are calculating the gradient of the gradient by re-using the same kernel, i.e. $$A*(A*V)$$ whereas you need to "reflect" the kernel through its center $$(JAJ)*(A*V)$$ However, if $A$ is centrosymmetric (e.g. a Gaussian) then $JAJ=A$.