Gradient of the matrix function $trace\big((W \odot (A-UV'))'(W\odot (A-UV'))\big)$

236 Views Asked by At

What is the gradient of the function w.r.to U ?.

$$ \operatorname{trace}\big((W \odot (A-UV'))'(W\odot (A-UV'))\big) $$

Here, $\odot$ is the Hadamard product and $'$ is the transpose. It seems like the gradient is $2(W\odot(UV'-A))V$. Can anyone explain me how we get this ?

3

There are 3 best solutions below

4
On BEST ANSWER

Let's use a colon to denote the trace/Frobenius product $$A:B={\rm tr}(A^TB)$$ For typing convenience, let's also define two additional matrices $$\eqalign{ Y &= UV^T-A \cr X &= W\odot Y \cr }$$ Write the function in terms of these new variables and find its differential and gradient $$\eqalign{ \phi &= X:X \cr d\phi &= 2X:dX \cr &= 2X:(W\odot dY) \cr &= 2(W\odot X):dY \cr &= 2(W\odot X):dU\,V^T \cr &= 2(W\odot X)V:dU \cr \frac{\partial\phi}{\partial U} &= 2(W\odot X)V \cr &= 2(W\odot W\odot Y)V \cr &= 2\Big(W\odot W\odot(UV^T-A)\Big)V \cr }$$

0
On

Letting $X=W\circ(A - UV')$ then in terms of components we have $$ \begin{align} \frac{\partial}{\partial U_{mn} }\left( \operatorname{Tr}X'X \right) &= \frac{\partial}{\partial U_{mn}}\left( \sum_{i, j} X_{ij}^2 \right) \\ &= 2\sum_{i,j} X_{ij}\frac{\partial X_{ij}}{\partial U_{mn} } . \end{align} $$ Checking the components we have $$ \begin{align} \frac{\partial X_{ij}}{\partial U_{mn} } &= \frac{\partial}{\partial U_{mn}} W_{ij} \cdot \left(A_{ij} - \sum_k U_{ik} V_{kj}' \right) \\ &= - \delta_{im} \cdot W_{ij} \cdot V_{nj}', \end{align} $$ and so \begin{align} \frac{\partial }{\partial U_{mn} }\operatorname{Tr}X'X &= 2\sum_{ij}X_{ij}\frac{\partial X_{ij}}{\partial U_{mn}} \\ &= 2\sum_{i, j} W_{ij}\cdot\left(\sum_k U_{ik} V_{kj}' - A_{ij}\right)\delta_{im}W_{ij}V_{nj}' \\ &= 2\sum_j W^2_{mj} \left([UV']_{mj} - A_{mj}\right)V_{jn} \\ &= 2 \sum_j [(W\circ W)\circ(UV' - A )]_{mj}[V]_{jn} \\ &= 2 \left[\left((W\circ W)\circ (UV' - A)\right) V\right]_{mn} \end{align} or \begin{align} \frac{\partial }{\partial U} \operatorname{Tr}(X'X) = 2\left(W^{\circ 2} \circ (UV' - A) \right) V \end{align}

0
On

Little trick: the product derivative rule works with matrix and Hadamard products too. It means that

$$ \frac d{dU_{i,j}}\operatorname{trace}\big((W \odot (A-UV'))'(W\odot (A-UV'))\big) = $$$$ \operatorname{trace}\big(\frac d{dU_{i,j}}(W \odot (A-UV'))'(W\odot (A-UV')) + (W \odot (A-UV'))'\frac d{dU_{i,j}}(W\odot (A-UV')) \big)= $$$$ \operatorname{trace}\big((W \odot -e_ie_j^TV')'(W\odot (A-UV')) + (W \odot (A-UV'))'(W\odot -e_ie_j^TV') \big) $$ but $tr(X'Y+Y'X) = 2tr(X'Y)$ (this works when the matrices are real), so $$ 2\operatorname{trace}\big((W \odot -e_ie_j^TV')'(W\odot (A-UV')) \big) = 2\operatorname{trace}\big((W' \odot Ve_je_i^T)(W\odot (UV'-A)) \big) $$ It's easy to see that $W' \odot Ve_je_i^T = (W'e_i \odot Ve_j)e_i^T$ and if $v,w$ are vectors, then $tr(vw^T)=\sum v_iw_i$ $$ 2\operatorname{trace}\big((W'e_i \odot Ve_j)e_i^T(W\odot (UV'-A)) \big) $$ $$ = 2 \sum_k W'_{ki}V_{kj}(W\odot (UV'-A))_{ik} = 2 \sum_k (W\odot (UV'-A))_{ik}W_{ik}V_{kj} $$ $$ = 2 [(W\odot W\odot (UV'-A))V]_{ij} $$


Notice: In the article, $W$ is an indicator matrix, so $W\odot W=W$ and you get the right result.