How to solve this $\frac{d X_u A X B X_u^T}{d X_u}$, where $X, A, B \in \mathbb{R}^{n \times n}$ and $X_u$ is the $u$-th row in $X$?
Derivative of $X_u A X B X_u^T$ w.r.t. $X_u$
246 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Given the Cartesian column vector $e_u$, the row vector can be written as $$X_u=e_u^TX$$ Use the Frobenius Inner Product to write the function and its differential $$\eqalign{ f &= XAXBX^T:e_ue_u^T \cr df &= (dX\,AXBX^T + XA\,dX\,BX^T + XAXB\,dX^T):e_ue_u^T \cr &= (e_ue_u^TXB^TX^TA^T + A^TX^Te_ue_u^TXB^T + e_ue_u^TXAXB):dX \cr }$$ Since $df=\big(\frac{\partial f}{\partial X}:dX\big),\,$ the gradient wrt $X$ is $$\eqalign{ \frac{\partial f}{\partial X} &= e_ue_u^TXB^TX^TA^T + A^TX^Te_ue_u^TXB^T + e_ue_u^TXAXB \cr }$$ To find the gradient wrt $X_u$, multiply the matrix gradient by $e_u^T$ $$\eqalign{ \frac{\partial f}{\partial X_u} &= e_u^T\frac{\partial f}{\partial X} \cr &= e_u^Te_ue_u^TXB^TX^TA^T + e_u^TA^TX^Te_ue_u^TXB^T + e_u^Te_ue_u^TXAXB \cr &= X_uB^TX^TA^T + e_u^TA^TX_u^TX_uB^T + X_uAXB \cr }$$
Let $f_u(X)=X_uAXBX_u^\mathrm{T}$ and $X_u=\mathbf{e}_u^\mathrm{T}X$ where $\mathbf{e}_u=(\delta_{iu})^{\mathrm{T}}_{1\le i\le n}\in\mathbb{R}^{n}$ (only the $u$-th entry is 1 and others are 0). We have
\begin{align*} &\frac{d\,X_uAXBX_u^\mathrm{T}}{dX}\\ =&\frac{d\,\mathbf{e}_u^\mathrm{T}XAXBX^\mathrm{T}\mathbf{e}_u}{dX}\\ =&\frac{d\,\mathrm{tr}(\mathbf{e}_u^\mathrm{T}XAXBX^\mathrm{T}\mathbf{e}_u)}{dX}\\ =&\frac{d\,\mathrm{tr}(XAXBX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T})}{dX} \end{align*} and \begin{align*} &d\,\mathrm{tr}(XAXBX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T})\\ =&\mathrm{tr}\left(dX\cdot A XB X^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}\right) +\mathrm{tr}\left(XA \cdot dX\cdot BX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}\right) +\mathrm{tr}\left(XA XB\cdot d(X^\mathrm{T})\cdot\mathbf{e}_u\mathbf{e}_u^\mathrm{T}\right)\\ =&\mathrm{tr}\left(A XB X^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}\cdot dX\right) +\mathrm{tr}\left(BX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}XA \cdot dX\right) +\mathrm{tr}\left(B^\mathrm{T}X^\mathrm{T}A^\mathrm{T}X^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}\cdot dX\right) \end{align*} So $$ \frac{d\,X_uAXBX_u^\mathrm{T}}{dX}=AXBX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}+BX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}XA+B^\mathrm{T}X^\mathrm{T}A^\mathrm{T}X^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T} $$
The derivative $df_u(X)/dX$ maps $dX$ to $df_u(X)$ by $df_u(X)=\mathrm{tr}(df_u(X)/dX\cdot dX)$.
To calculate $df_u(X)/dX_u$, we only allow the $u$-th row of $dX$ non-zero. Then only the $u$-th column of $df_u(X)/dX$ can contribute to $df_u(X)$. So $df_u(X)/dX_u=$ the $u$-th column of $df_u(X)/dX$.
I did some simulations in Python as well:
One can see that
df_a(afor "analytical") is roughly the same asdf_e(efor "empirical").