A matrix differentiation with trace and kronecker product

148 Views Asked by At

I'm new to matrix calculus and want to differentiate the following function w.r.t $X$, $Y$ $$\phi(X,Y) = Y^TA^T(L\otimes X):Y^TA^T(L\otimes X) =tr((L\otimes X)^TAYY^TA^T(L\otimes X)) $$ I know the derivative w.r.t $Y$, but have no idea how to start it w.r.t $X$
I have checked some related questions and solutions, but I still have no idea how the 'differential' $d\phi$ are derived.
Is there any reference about calculating the differential of a matrix?
Any help will be appreciated!!

2

There are 2 best solutions below

0
On

The differential in this case will be $$ d\phi = 2 Y^TA^T(L \otimes X):Y^TA^T(L \otimes dX). $$

3
On

$\def\p#1#2{\frac{\partial #1}{\partial #2}}$ Define $B=(L\otimes X)^TAY$ and assume the following sizes for the matrices $$\eqalign{ m,n &= {\rm size}(X) \\ p,q &= {\rm size}(L) \\ mp,r &= {\rm size}(A) \\ r,s &= {\rm size}(Y) \\ nq,s &= {\rm size}(B) \\ }$$ Kronecker products can be vectorized with the aid of a Commutation matrix $\,(K_{np})$ $$\eqalign{ {\rm vec}(L\otimes X) &= \left(I_q\otimes K_{np}\otimes I_m\right)\cdot \left({\rm vec}(L)\otimes I_m\otimes I_n\right)\cdot {\rm vec}(X) \\ &\doteq M\,{\rm vec}(X) \\ }$$ Write the function in terms of $B$. Then calculate its differential and gradient. $$\eqalign{ \phi &= B^T:B^T \\ d\phi &= 2B^T:dB^T \\ &= 2B^T:(AY)^T(L\otimes dX) \\ &= 2AYB^T:(L\otimes dX) \\ &= 2\,{\rm vec}(AYB^T):M\;{\rm vec}(dX) \\ &= 2\,M^T{\rm vec}(AYB^T):{\rm vec}(dX) \\ &= 2\;{\rm devec}\Big(M^T{\rm vec}(AYB^T)\Big):dX \\ \p{\phi}{X} &= 2\;{\rm devec}\Big(M^T{\rm vec}(AYB^T)\Big) \\ }$$ The hardest part of the process is freeing $dX$ from the Kronecker term.

Vectorization/devectorization was used to handle the Kronecker product in this example, but one might also employ the Singular value decomposition or the Pitsianis decomposition of the matrix $(AYB^T)$, both of which were demonstrated in your linked answer.

Another possibility is a block Kronecker decomposition of the $(AYB^T)$ matrix.

The key idea is that the Kronecker and Frobenius products have a nice distributive property $$(A\otimes B):(C\otimes dX) \;=\; (A:C)\,B:dX$$ for compatibly dimensioned matrices.