$C \in \mathbb{R}^{m \times n}, X \in \mathbb{R}^{m \times n}, W \in \mathbb{R}^{m \times k}, H \in \mathbb{R}^{n \times k}$
$W_{i.}$ is the $i$th row of $W$
$H_{j.}$ is the $j$th row of $H$
$$f=arg\min_{W, H} \sum_{i,j} C_{ij} \circ (X_{ij} - W_{i.}H_{j.}^T)^2$$
$\circ$: hadamard product (element-wise product)
$$\frac {\partial{f}} {\partial{W}} = ?$$ $$\frac {\partial f}{\partial H} = ?$$
Hi everyone,
I am chemoinformatic background (poor for mathmatics). Recently, I was learning matrix factorization. That is given a matrix X with {0, 1} entries, which was decomposed to two matrices W and H. And in order to obtain the optimal W and H, one need solve the partial derivative of W and H to objective function f. I can proof the partial derivative for W and H if the matrix factorization (MF) is not weighted according to the relationship between Frobenius norm and trace. However, for the weighted MF (like the figure, it has another weighting matrix C, and C hadamard product the original one), I can not get the partial derivative for W and H even though some papers (such as: "collaborative filtering for implicit feedback datasets"; "Evaluation of one-class collaborative filtering") have proven them.
Can you give me a trick how to deal with hadamard product(element-wised product)? can it change to normal matrix product? How to get partial derivative for W and H? can linear algebra or matrix algebra solve them?
Denote: X is a matrix with {0, 1} entries and X was decomposed to W and H matrices. C is the weighting matrix which has the same dimension with X.
Thanks.
For convenience let $$\eqalign{ M &= X - WH^T \cr dM &= -(dW\,H^T + W\,dH^T) \cr }$$
Then write the objective function in terms of the Frobenius (:) inner product and Hadamard ($\circ$) product, and find its differential $$\eqalign{ f &= C:M\circ M \cr df &= 2\,C:M\circ dM \cr &= 2\,C\circ M : dM \cr &= -2\,C\circ M : (dW\,H^T + W\,dH^T) \cr }$$ Now set $dH=0$ to obtain the gradient wrt $W$ $$\eqalign{ df &= -2\,C\circ M : dW\,H^T \cr &= -2\,(C\circ M)\,H : dW \cr\cr \frac{\partial f}{\partial W} &= -2\,(C\circ M)\,H \cr }$$ Similarly, setting $dW=0$ yields the gradient wrt $H$ $$\eqalign{ \frac{\partial f}{\partial H} &= -2\,(C\circ M)^T\,W \cr\cr }$$ The above derivation uses the fact that the Frobenius and Hadamard products commutate with themselves and with each other, i.e. $$\eqalign{ A\circ B &= B\circ A \cr A:B &= B: A \cr A\circ B:C &= A:B\circ C \cr }$$ and on a rearrangement property of the Frobenius product, which follows from the cyclic property of the trace $$\eqalign{ AB:C &= A:CB^T \cr AB:C &= B:A^TC \cr }$$ and on the differential properties of the products $$\eqalign{ d(A:B) &= A:dB + B:dA \cr d(A\circ B) &= A\circ dB + B\circ dA \cr }$$