I need some guidance on the proof of one of the equations published by Belge, M., et al. 2002, Inverse Problems, 18(4), p.1161. It discusses a multi-constrained regularization approach:
$$\boldsymbol{f}^{*}\left(\boldsymbol{\alpha}\right) = \rm{arg\ } \underset{\it f}{\rm min} \left\{\|\boldsymbol{g} - \boldsymbol{H}\boldsymbol{f}\|_2^2+\sum\limits_{{\it{i}}=1}^{\it M}\alpha_{\it i}\Phi_{\it i}\left(\boldsymbol{R}_{\it i}\boldsymbol{f}\right)\right\}, \ \ \ \ \ \boldsymbol{R}_{\it i},\boldsymbol{H}\in \mathbb{R}^{{\it m}\times {\it n}}$$
where $M$ is the number of constraints, $\boldsymbol{\alpha} = \left[\alpha_1, \alpha_2,\cdots, \alpha_m\right]^T$, $\boldsymbol{R}_i$ are regularization operators, and $\alpha_i$ are the corresponding regularization parameters, $\Phi_i\left(\boldsymbol{R}_i\boldsymbol{f}\right)=\sum_{j=1}^{m}\phi_i\left(\left[\boldsymbol{R}_i\boldsymbol{f}\right]_j\right)$ and the notation $\left[\boldsymbol{R}_i\boldsymbol{f}\right]_j$ denotes the jth element of the vector $\boldsymbol{R}_i\boldsymbol{f}$. In addition, $\phi_i\left(t\right)$ is a continuously differentiable, convex, non-negative ($\phi_i\left(t\right) \geqslant 0, \forall t$) even function.
By taking the gradient with respect to $f$ and setting the result equal to zero we obtain the following first-order condition that must be satisfied by $f^*\left(\boldsymbol{\alpha}\right)$
$$\boldsymbol{K}_{f^*}\boldsymbol{f}^* = \boldsymbol{H}^T\boldsymbol{g}$$
where
$$\boldsymbol{K}_{f^*}=\boldsymbol{H}^T\boldsymbol{H}+\frac 12 \sum\limits_{i=1}^{M}\alpha_i\boldsymbol{R}_i^T \underset{k=1,\cdots,m}{\rm{diag}} \left[\frac{\phi'_i\left(\left[\boldsymbol{R}_i\boldsymbol{f}^*\right]_k\right)}{\left[\boldsymbol{R}_i\boldsymbol{f}^*\right]_k}\right]\boldsymbol{R}_i$$
Could someone kindly give me some hint how the following term has obtained? $\frac 12 \sum\limits_{i=1}^{M}\alpha_i\boldsymbol{R}_i^T \underset{k=1,\cdots,m}{\rm{diag}} \left[\frac{\phi'_i\left(\left[\boldsymbol{R}_i\boldsymbol{f}^*\right]_k\right)}{\left[\boldsymbol{R}_i\boldsymbol{f}^*\right]_k}\right]\boldsymbol{R}_i$.
I applied chain rule of composite functions but I can't get it right.
If we apply a function, $\phi$, element-wise to a vector $w$, the result is a vector value $$v = \phi(w)$$ whose differential (also a vector) can be expressed using an elementwise (Hadamard) product $$\eqalign{ dv &= \phi'\circ dw \cr &= {\rm diag}(\phi')\,dw \cr }$$
For your problem, let $w=Rf$ and assume there is only one constraint.
Let's find the gradient of that constraint $$\eqalign{ \Phi &= 1^Tv \cr d\Phi &= 1^Tdv = 1^T{\rm diag}(\phi')\,dw = 1^T{\rm diag}(\phi')R\,df \cr \frac{\partial\Phi}{\partial f} &= R^T{\rm diag}(\phi')\,1 \cr }$$ Now let's look at the gradient of the original function plus the constraint $$\eqalign{ \lambda &= \|Hf-g\|^2 + \alpha\Phi \cr \frac{\partial\lambda}{\partial f} &= 2H^T(Hf-g) + \alpha R^T{\rm diag}(\phi')\,1 \cr }$$ Setting the gradient to zero yields $$\eqalign{ H^Tg &= H^THf + \frac{1}{2}\alpha R^T{\rm diag}(\phi')\,1 \cr }$$ Now the authors pull a stupid trick to substitute the $1$ on the far RHS with the following $$\eqalign{ 1 &= {\rm diag}\Big(\frac{1}{Rf}\Big)\,Rf \cr }$$ Leaving us with $$\eqalign{ H^Tg &= H^THf + \frac{1}{2}\alpha R^T{\rm diag}\Big(\frac{\phi'}{Rf}\Big)Rf \cr &= Kf \cr }$$ The function with multiple constraints has the same form, just put subscripts on the $(R,\phi,\alpha)$ symbols and sum them.