Derivative of $X_u A X B X_u^T$ w.r.t. $X_u$

246 Views Asked by At

How to solve this $\frac{d X_u A X B X_u^T}{d X_u}$, where $X, A, B \in \mathbb{R}^{n \times n}$ and $X_u$ is the $u$-th row in $X$?

2

There are 2 best solutions below

6
On BEST ANSWER

Let $f_u(X)=X_uAXBX_u^\mathrm{T}$ and $X_u=\mathbf{e}_u^\mathrm{T}X$ where $\mathbf{e}_u=(\delta_{iu})^{\mathrm{T}}_{1\le i\le n}\in\mathbb{R}^{n}$ (only the $u$-th entry is 1 and others are 0). We have

\begin{align*} &\frac{d\,X_uAXBX_u^\mathrm{T}}{dX}\\ =&\frac{d\,\mathbf{e}_u^\mathrm{T}XAXBX^\mathrm{T}\mathbf{e}_u}{dX}\\ =&\frac{d\,\mathrm{tr}(\mathbf{e}_u^\mathrm{T}XAXBX^\mathrm{T}\mathbf{e}_u)}{dX}\\ =&\frac{d\,\mathrm{tr}(XAXBX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T})}{dX} \end{align*} and \begin{align*} &d\,\mathrm{tr}(XAXBX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T})\\ =&\mathrm{tr}\left(dX\cdot A XB X^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}\right) +\mathrm{tr}\left(XA \cdot dX\cdot BX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}\right) +\mathrm{tr}\left(XA XB\cdot d(X^\mathrm{T})\cdot\mathbf{e}_u\mathbf{e}_u^\mathrm{T}\right)\\ =&\mathrm{tr}\left(A XB X^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}\cdot dX\right) +\mathrm{tr}\left(BX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}XA \cdot dX\right) +\mathrm{tr}\left(B^\mathrm{T}X^\mathrm{T}A^\mathrm{T}X^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}\cdot dX\right) \end{align*} So $$ \frac{d\,X_uAXBX_u^\mathrm{T}}{dX}=AXBX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}+BX^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T}XA+B^\mathrm{T}X^\mathrm{T}A^\mathrm{T}X^\mathrm{T}\mathbf{e}_u\mathbf{e}_u^\mathrm{T} $$

The derivative $df_u(X)/dX$ maps $dX$ to $df_u(X)$ by $df_u(X)=\mathrm{tr}(df_u(X)/dX\cdot dX)$.

To calculate $df_u(X)/dX_u$, we only allow the $u$-th row of $dX$ non-zero. Then only the $u$-th column of $df_u(X)/dX$ can contribute to $df_u(X)$. So $df_u(X)/dX_u=$ the $u$-th column of $df_u(X)/dX$.

I did some simulations in Python as well:

import numpy as np


def f(X, u, A, B):
    Xu = X[u, :]
    return Xu @ A @ X @ B @ Xu.T


def f_grad(X, u, A, B):
    n = X.shape[0]
    eu_eu_T = np.zeros((n, n))
    eu_eu_T[u, u] = 1
    part1 = A @ X @ B @ X.T @ eu_eu_T
    part2 = B @ X.T @ eu_eu_T @ X @ A
    part3 = B.T @ X.T @ A.T @ X.T @ eu_eu_T
    return part1 + part2 + part3


if __name__ == '__main__':
    n = 4
    A = np.random.randn(n, n)
    B = np.random.randn(n, n)
    X = np.random.randn(n, n)
    u = 1
    eps = 1e-8

    grad = f_grad(X, u, A, B)
    f_X = f(X, u, A, B)

    for i in range(20):
        dXu = np.random.randn(n) * eps
        dX = np.zeros((n, n))
        dX[u, :] = dXu
        df_a = np.dot(grad[:, u], dXu)
        df_e = f(X + dX, u, A, B) - f_X
        print('df_a = %g\tdf_e = %g' % (df_a, df_e))

One can see that df_a (a for "analytical") is roughly the same as df_e (e for "empirical").

0
On

Given the Cartesian column vector $e_u$, the row vector can be written as $$X_u=e_u^TX$$ Use the Frobenius Inner Product to write the function and its differential $$\eqalign{ f &= XAXBX^T:e_ue_u^T \cr df &= (dX\,AXBX^T + XA\,dX\,BX^T + XAXB\,dX^T):e_ue_u^T \cr &= (e_ue_u^TXB^TX^TA^T + A^TX^Te_ue_u^TXB^T + e_ue_u^TXAXB):dX \cr }$$ Since $df=\big(\frac{\partial f}{\partial X}:dX\big),\,$ the gradient wrt $X$ is $$\eqalign{ \frac{\partial f}{\partial X} &= e_ue_u^TXB^TX^TA^T + A^TX^Te_ue_u^TXB^T + e_ue_u^TXAXB \cr }$$ To find the gradient wrt $X_u$, multiply the matrix gradient by $e_u^T$ $$\eqalign{ \frac{\partial f}{\partial X_u} &= e_u^T\frac{\partial f}{\partial X} \cr &= e_u^Te_ue_u^TXB^TX^TA^T + e_u^TA^TX^Te_ue_u^TXB^T + e_u^Te_ue_u^TXAXB \cr &= X_uB^TX^TA^T + e_u^TA^TX_u^TX_uB^T + X_uAXB \cr }$$