Compute the derivative of $\mbox{tr}(AXB)$ with respect to $X$

621 Views Asked by At

Given matrices $A, B \in \Bbb R^{2 \times 2}$, compute the derivative of $\mbox{tr}(AXB)$ with respect to $X \in \Bbb R^{2 \times 2}$.


I know that $\frac{\partial tr(AXB}{\partial X}$ is same like $tr(\frac{\partial,AXB}{\partial X})$

i am knot sure about the my understand how the derivative works wrt matrix but this is what i got at first and wanted to know if it is correct?

enter image description here


$$ \Large\frac{\partial}{\partial {\bf X}} \mbox{tr} \left( {\bf f} ({\bf X}) \right) = \mbox{tr} \left( \frac{\partial {\bf f} ({\bf X})}{\partial {\bf X}} \right) $$

My idea was to use this formula on $f(X) := AXB$. Then, I wanted to compute its derivative. However, I got a tensor, i.e., a $4$-dimensional matrix. If compute the trace of this tensor, it seems like something different?

2

There are 2 best solutions below

0
On BEST ANSWER

More generally, given matrices ${\bf A}, {\bf B} \in \Bbb R^{n \times n}$, let the linear scalar field $f : \Bbb R^{n \times n} \to \Bbb R$ be defined by

$$ f ( {\bf X} ) := \mbox{tr} \left( {\bf A} {\bf X} {\bf B} \right) = \mbox{tr} \left( {\bf B} {\bf A} {\bf X} \right) = \left\langle \color{blue}{{\bf A}^\top {\bf B}^\top}, {\bf X} \right\rangle$$

where the cyclic property of the trace and the Frobenius inner product were used. Hence, the gradient of $f$ with respect to ${\bf X}$ is

$$ \boxed{ \nabla_{{\bf X}} f ({\bf X}) = \color{blue}{{\bf A}^\top {\bf B}^\top} }$$


Addendum

Let $\partial_{ij} := \partial_{x_{ij}}$. Hence,

$$ \partial_{ij} f ({\bf X}) = \partial_{ij} \mbox{tr} \left( {\bf A} {\bf X} {\bf B} \right) = \mbox{tr} \left( {\bf A} \left( \partial_{ij} {\bf X} \right) {\bf B} \right) = \mbox{tr} \left( {\bf A} \, {\bf e}_i {\bf e}_j^\top {\bf B} \right) = \cdots = \left( {\bf A}^\top {\bf B}^\top \right)_{ij} $$

where $\left( {\bf M} \right)_{ij}$ denotes the $(i,j)$-th entry of matrix ${\bf M}$. Thus, the gradient of $f$ is $\nabla_{{\bf X}} f ({\bf X}) = {\bf A}^\top {\bf B}^\top$.

1
On

In the following, the colon operator denotes Frobenius inner product $$ \mathbf{U}:\mathbf{V} = \mathrm{tr}(\mathbf{U}^T \mathbf{V}) $$

The cost function writes $\phi(\mathbf{X}) = \mathrm{tr}(\mathbf{AXB}) = \mathrm{tr}(\mathbf{BAX}) = (\mathbf{BA})^T:\mathbf{X} $

In denominator layout convention, the gradient writes $$ \frac{\partial \phi}{\partial \mathbf{X}} = (\mathbf{BA})^T $$