Gradient of trace of Kronecker product of matrix products

524 Views Asked by At

How do I derive the following gradient: $\nabla_X\left[tr(AX \otimes AX)\right]$? I'm not assuming any structure in $A \in \mathbb{R}^{m \times n}$ and $X \in \mathbb{R}^{n \times m}$.

I know that $\nabla_X\left[tr(X \otimes X)\right]=2tr(X)I$ from the Matrix Cookbook, but I'm not sure how to apply the chain rule in this case.

I did try expanding out the entire function and finding a partial derivative, which gave me: \begin{align*} \frac{\partial[tr(AX \otimes AX)]}{\partial X_{uv}}=2\alpha_u \sum^m_{i=1}\sum^n_{j=1} \alpha_i X_{ij} \end{align*} where $\alpha_i = \mathbf{1}_m^T A_i$ is the sum of the $i$th column of $A$.

This does agree with $\nabla_X\left[tr(X \otimes X)\right]=2tr(X)I$ if we set $A=I_n$. But I still suspect there's something wrong with it, and in any case there has to be a better way than by brute force.

1

There are 1 best solutions below

0
On BEST ANSWER

You can make use of this mixed-product rule for the Kronecker and Frobenius products $$ (A\otimes B) : (X\otimes Y) = (A:X)\otimes(B:Y) $$ along with the Frobenius-trace relationship $$A:B={\rm tr}(A^TB)$$ to rewrite the function and find its differential $$\eqalign{ f &= {\rm tr}(AX\otimes AX) \cr &= (I\otimes I):(AX\otimes AX) \cr &= (I:AX)\otimes(I:AX) \cr &= (A^T:X)\otimes(A^T:X) \cr &= (A^T:X)^2 \cr\cr df &= 2\,(A^T:X)\,\,d(A^T:X) \cr &= 2\,(A^T:X)\,\,A^T:dX \cr }$$ Since $df=(\frac{\partial f}{\partial X}:dX),\,$ the gradient must be $$\eqalign{ \frac{\partial f}{\partial X} &= 2\,(A^T:X)\,\,A^T \cr &= 2\,{\rm tr}(AX)\,A^T }$$