Derivative of the trace of a Kronecker product

582 Views Asked by At

I am trying to compute the derivative

$\frac{\partial}{\partial W} \text{Tr}(W^\top A (I\otimes W)B),$

where $W\in\mathbb{R}^{D\times d}, I\in\mathbb{R}^{T\times T}$ is an identity matrix, $A\in\mathbb{R}^{D\times DT}$, and $B\in\mathbb{R}^{dT\times d}$.

I have found a similar post: Derivative involving the trace of a Kronecker product

but it seems that the method is not applicable to my problem.

Thank you!

1

There are 1 best solutions below

2
On

The technique from the linked post can be applied to the current problem.

Write the function in terms of the trace/Frobenius product, and find its differential $$\eqalign{ \phi &= W:A(I\otimes W)B = A^TWB^T:(I\otimes W) \cr d\phi &= A(I\otimes W)B:dW + A^TWB^T:(I\otimes dW) }$$ At this point, we need use the Pitsianis decomposition on that last term. $$\eqalign{ A^TWB^T &= \sum_k Y_k\otimes Z_k \cr }$$ The matrices $(Y_k,Z_k)$ are shaped like $(I,W)$ respectively.
Finish calculating the differential, then on to the gradient. $$\eqalign{ d\phi &= A(I\otimes W)B:dW + \sum_kY_k\otimes Z_k:(I\otimes dW) \cr &= \Big(A(I\otimes W)B + \sum_k(I:Y_k)Z_k\Big):dW \cr \frac{\partial\phi}{\partial W} &= A(I\otimes W)B + \sum_k {\rm tr}(Y_k)\,Z_k \cr\cr }$$


Another technique uses the SVD of $$B=\sum_k\sigma_ku_kv_k^T$$ to handle the second term of $d\phi$ as follows. $$\eqalign{ A^TW:(I\otimes dW)B &= \sum_k\,A^TW:(I\otimes dW)\sigma_ku_kv_k^T \cr &= \sum_k\,(A^TW\sigma_kv_k):(I\otimes dW)u_k \cr &= \sum_k\,q_k:{\rm vec}(dW\,U_k) \cr &= \sum_k\,Q_k:dW\,U_k \cr &= \sum_k\,Q_kU_k^T:dW \cr }$$ where $$\eqalign{ {\rm vec}(Q_k) &= q_k = A^TW\sigma_kv_k \cr {\rm vec}(U_k) &= u_k \cr }$$ Yielding the gradient as $$\eqalign{ \frac{\partial\phi}{\partial W} &= A(I\otimes W)B + \sum_k Q_kU_k^T \cr }$$