How to solve $F''_{w_{ab}}$? I want to know the detailed calculation process of solving $F''_{w_{ab}}$.

219 Views Asked by At

There is a objective function:

$f(W)$ = $||W||_{2,1}$

For any element $w_{ab}$ in $W$, we apply $F_{w_{ab}}$ to denote the part of $f(W)$ which is only related to $w_{ab}$.

$F'_{w_{ab}}=(DW)_{ab}$

$F''_{w_{ab}}=(D-D^{3}(W\odot W))_{aa}$, (the main problem)

where $D_{ii}$=$\frac{1}{||w^i||_2}$, $\odot$ denotes the element-wise multiplication.

About $F_{w_{ab}}$: why the second order derivative of F is equation (28)?

Note that: The norm $\|\cdot\|_{2,1}$ of a matrix $W\in\mathbb{R}^{n\times m}$ is defined as

$$ \Vert W \Vert_{2,1} = \sum_{i=1}^n \Vert w^{i} \Vert_2 = \sum_{i=1}^n \left( \sum_{j=1}^m |w_{ij}|^2 \right)^{1/2} $$ where $w^i$ denotes $i^{th}$ row of $W$, $w_{ij}$ denotes a element of $W$.

How to solve $F''_{w_{ab}}$? I want to know the detailed calculation process of solving the above formula.

There some more explicit definition in the following papers (I gave the exact location.)

Some Related Papers:

Efficient and Robust Feature Selection via Joint $l_{2,1}$-Norms Minimization

Graph Regularized Nonnegative Matrix Factorization for Data Representation (Look Page 10, APPENDIX A, PROOFS OF THEOREM 1, formulas (26), (27) and (28))

In Nonnegative matrix factorization by joint locality-constrained and $l_{2,1}$-norm regularization, (Look Page 7, the Proof of Lemma 1: Here, how to obtain the result of $F^{''}_{v_{ab}}$?)

Thank you all for your help.

1

There are 1 best solutions below

6
On BEST ANSWER

Define $$\eqalign{ b &= (W\odot W)\,1 \implies db=2(W\odot dW)1 \cr B &= {\rm Diag}(b),\quad D = B^{-1/2},\quad E=e_a\varepsilon_b^T \cr }$$ Write the $\{2,1\}$ norm in terms of $b$ and find its gradient. $$\eqalign{ f &= \|W\|_{2,1} \cr&= b^{\odot 1/2}:1 \cr df &= \tfrac{1}{2}b^{\odot-1/2}:db \cr &= b^{\odot-1/2}:(W\odot dW)1 \cr &= b^{\odot-1/2}1^T:(W\odot dW) \cr &= (b^{\odot-1/2}1^T\odot W):dW \cr &= B^{-1/2} W:dW \cr F'=\frac{\partial f}{\partial W} &= B^{-1/2} W = DW \cr F_{ab}' &= e_a^TF'\varepsilon_b = E:DW \cr }$$ Now find the gradient of this last quantity. $$\eqalign{ dF_{ab}' &= E:(D\,dW+dD\,W) \cr &= DE:dW + EW^T:dB^{-1/2} \cr &= DE:dW - \tfrac{1}{2}EW^T:B^{-3/2}dB \cr &= DE:dW - \tfrac{1}{2}{\rm diag}(D^3EW^T):db \cr &= DE:dW - {\rm diag}(D^3EW^T):(W\odot dW)1 \cr &= DE:dW - \big({\rm diag}(D^3EW^T)1^T\big)\odot W:dW \cr &= DE:dW - {\rm Diag}(D^3EW^T)W:dW \cr &= \Big(DE - D^3\,{\rm Diag}(EW^T)W\Big):dW \cr F_{ab}'' = \frac{\partial F_{ab}'}{\partial W} &= DE - D^3\,{\rm Diag}(EW^T)W \cr }$$ In the above, $1$ is a vector with all elements equal to one, $e_a$ is a vector from the standard basis for ${\mathbb R}^{n}$, the basis vector $\varepsilon_b$ is from ${\mathbb R}^{m}$, a colon denotes the trace/Frobenius product, e.g. $$A:B={\rm Tr}(A^TB)$$ the symbol $\odot$ denotes the elementwise/Hadamard product, and Hadamard powers are denote as $$A^{\odot 3}=A\odot A\odot A$$ The function $a={\rm diag}(A)$ extracts the diagonal elements of $A$ into a vector $a$
while the function ${\rm Diag}(A)$ sets the off-diagonal elements of $A$ to zero.