Derivative of a trace with relation to a vector inside a kronecker product

101 Views Asked by At

I'm trying to obtain the derivative wrt $\beta$ in

$\textrm{Tr}(A(I_n \otimes \beta)B(I_n \otimes \beta))$.

I've tried to follow the same procedure as this question Derivative of a trace with second order Kronecker product

but I'm confused. I've tried to make

$X^\top A^\top : BX$

by taking $X = (I_n \otimes \beta)$ but I couldn't go further than this.

Any thoughts?

2

There are 2 best solutions below

0
On

Though not the most elegant solution, here is one that gives the derivative with respect to the vectorized version of beta using the standard text book result $${\rm Tr}\{A X B X^{\rm T} C^{\rm T}\} = {\rm vec}\{X\}^{\rm T} \cdot \left(B^{\rm T} \otimes (C^{\rm T} A) \right) \cdot {\rm vec}\{X\}.$$ Applied to your problem, this gives $${\rm Tr}\{A (I_n \otimes \beta) B (I_n \otimes \beta) \} = {\rm vec}\{I_n \otimes \beta\}^{\rm T} \cdot \left(B^{\rm T} \otimes A \right) \cdot {\rm vec}\{(I_n \otimes \beta)^{\rm T}\}.$$ Next, use the fact that ${\rm vec}\{X^{\rm T}\} = K_{m,n}^{\rm T} \cdot {\rm vec}\{X\}$, for any $X$ of size $m \times n$ where $K_{m,n}$ is the commutation matrix so that we have $${\rm Tr}\{A (I_n \otimes \beta) B (I_n \otimes \beta) \} = {\rm vec}\{I_n \otimes \beta\}^{\rm T} \cdot \left(B^{\rm T} \otimes A \right) \cdot K_{n,n}^{\rm T} \cdot {\rm vec}\{I_n \otimes \beta\}.$$ Then, use the fact that we can write ${\rm vec}\{I_n \otimes X\} = (P\otimes I_n) \cdot {\rm vec}\{X\}$, where $P = [(I_n \otimes e_{n,1})^{\rm T}, \ldots, (I_n \otimes e_{n,n})^{\rm T}]^{\rm T}$. This gives $${\rm Tr}\{A (I_n \otimes \beta) B (I_n \otimes \beta) \} = {\rm vec}\{\beta\}^{\rm T}\cdot (P^{\rm T} \otimes I_n) \cdot \left(B^{\rm T} \otimes A \right) \cdot K_{n,n}^{\rm T} \cdot (P \otimes I_n) \cdot {\rm vec}\{ \beta\}.$$ Finally, we have $\frac{\partial}{\partial q} q^T X^{\rm T} q = (X + X^{\rm T}) q$, wich gives something like $$\frac{\partial}{\partial {\rm vec}\{\beta\}} {\rm Tr}\{A (I_n \otimes \beta) B (I_n \otimes \beta) \} = (P^{\rm T} \otimes I_n) \cdot [\left(B^{\rm T} \otimes A \right) \cdot K_{n,n}^{\rm T}+K_{n,n} \left(B \otimes A^{\rm T} \right) ] \cdot (P \otimes I_n) \cdot {\rm vec}\{ \beta\}.$$

Yeah. Not the most elegant solution, as I said.

0
On

Let's follow through on your idea and take $$ \def\c#1{\color{red}{#1}} \def\b{\beta} \def\e{{\large\varepsilon}} \def\p{\partial} \def\grad#1#2{\frac{\p#1}{\p#2}}\\ \def\J{{\cal J}} X = I\otimes\b $$ then you can write the cost function in two equivalent ways $$\J = X^TA^T:BX = X^TB^T:AX$$ You need both forms to calculate the differential of the function $$\eqalign{ d\J &= X^TA^T:B\,dX \;+\; X^TB^T:A\,dX \\ &= \c{(B^TX^TA^T + A^TX^TB^T)}:dX \\ &\equiv \,\c{G}:dX \\ &= \,G:(I\otimes d\b) \\ }$$ The gradient of $\b$ wrt the $\b_k$ component is the $k^{th}$ Cartesian basis vector, i.e. $$\eqalign{ \grad{\b}{\b_k} &= \e_k \\ }$$ Substituting this into the previous result yields $$\eqalign{ \grad{\J}{\b_k} &= G:(I\otimes\e_k) \\ }$$