Applying chain rule to a trace formula in matrix calculus

241 Views Asked by At

I am trying to differentiate ${\rm tr}(A(X\otimes I_n))$ with respect to $X$. What I have in mind is using chain rule but I am not sure if its correct in matrix calculus

$$ \partial\frac{{\rm tr}(A(X\otimes I_n))}{\partial X}=\partial\frac{{\rm tr}(A(X\otimes I_n))}{\partial (X\otimes I_n)}\frac{\partial X\otimes I_n}{\partial X} $$

Is this correct? And if so can somebody send me a reference that justifies the step that I take?

Thank you.

3

There are 3 best solutions below

0
On BEST ANSWER

You can prove the chain rule for matrix calculus in the exact way you prove it for ordinary calculus, as seen here. The only necessary change is to divide by $|H|$ in the definition of the derivative and to multiply by $|H|$ and $|K|$ in the formulae given for $g(X+H)$ and $f(Y+K)$, respectively. So, yes, this will work fine.

1
On

The mapping $X\mapsto tr(A(X\otimes I_n))=:F(X)$ is linear. Hence the derivative is at $X$ in direction $\delta X$ given by $$ F'(X)\delta X = tr(A(\delta X\otimes I_n)). $$ You do not need chain rule in this special case. Your reasoning is still correct.

0
On

Assuming you can find a Kronecker factorization of $A^T$ as $$A^T = B\otimes C$$ where $B,C$ have the same dimensions as $X,I$ (respectively), then your problem has a very nice solution.

Express your function in terms of the Frobenius product: $$ \eqalign{ f &= (X\otimes I):A^T \cr &= (X\otimes I):(B\otimes C) \cr &= (X:B)\otimes(I:C) \cr &= (X:B)\,\,{\rm tr}(C) \cr } $$ For which the differential is trivial $$ \eqalign{ df &= dX:B\,\,{\rm tr}(C) \cr } $$ as is the derivative $$ \eqalign{ \frac {\partial f} {\partial X} &= B\,\,{\rm tr}(C)\cr } $$

Update:

Even if the matrix can't be factored, it can be decomposed into a finite sum of Kronecker products $$ \eqalign{ A^T &= \sum_{k=1}^r B_k\otimes C_k \cr \frac {\partial f} {\partial X} &= \sum_{k=1}^rB_k\,\,{\rm tr}(C_k)\cr }$$ where $r$ is the rank of the vecpose of $A$.