Derivative of quadratic matrix product (not vector $d (xAx^T) = x (A + A^T))$, $A$ is not symmetric

105 Views Asked by At

I am trying to differentiate the matrix product of the quadratic form, $M=XUX^T$, with respect to the matrix, $X$. That is, I want $\frac{\partial M}{\partial X}$. Working from a concrete example with matrices, $M,X,U \in \mathbb{R}^{4,4}$, and a particular element, $x_{3,1}$, my plan was to find the general form of the derivative of an element which would lead to the matrix form. A single element in $X$ appears 8x in the computation of the product, $M$; 4 on the left, and 4 on the right. My plan was to compute the left and right derivatives then add them. Examining a single computation (transpose accounted for in the subscripts, 1-indexing),

$$ m_{3,2} = \\ (x_{3, 1}\cdot u_{1, 1} + x_{3, 2}\cdot u_{2, 1} + x_{3, 3}\cdot u_{3, 1} + x_{3, 4}\cdot u_{4, 1}) \cdot x_{2,1} \\ + (x_{3, 1}\cdot u_{1, 2} + x_{3, 2}\cdot u_{2, 2} + x_{3, 3}\cdot u_{3, 2} + x_{3, 4}\cdot u_{4, 2}) \cdot x_{2,2} \\ + (x_{3, 1}\cdot u_{1, 3} + x_{3, 2}\cdot u_{2, 3} + x_{3, 3}\cdot u_{3, 3} + x_{3, 4}\cdot u_{4, 3}) \cdot x_{2,3} \\ + (x_{3, 1}\cdot u_{1, 4} + x_{3, 2}\cdot u_{2, 4} + x_{3, 3}\cdot u_{3, 4} + x_{3, 4}\cdot u_{4, 4}) \cdot x_{2,4} $$

then $$ \frac {\partial m_{3,2} } {\partial x_{3,1}} = u_{1, 1} \cdot x_{2,1} + u_{1, 2} \cdot x_{2,2} + u_{1, 3} \cdot x_{2,3} + u_{1, 4} \cdot x_{2,4} = u_{1, :} \cdot x_{2,:}^T $$

Each entry in $X$ will be used 4 times on the left side, so differentiating the left side in all 4 places that it appears gives (the transposes cancelling out),

$$\frac {\partial m_{3,:}}{\partial x_{3,1}} = u_{1,:}\cdot X$$ $x_{3,1}$ will also appear 4 times on the right side (accounting for the transpose): $$m_{i,3} = \\ (x_{i, 1}\cdot u_{1, 1} + x_{i, 2}\cdot u_{2, 1} + x_{i, 3}\cdot u_{3, 1} + x_{i, 4}\cdot u_{4, 1}) \cdot x_{3,1} \\ + (x_{i, 1}\cdot u_{1, 2} + x_{i, 2}\cdot u_{2, 2} + x_{i, 3}\cdot u_{3, 2} + x_{i, 4}\cdot u_{4, 2}) \cdot x_{3,2} \\ + (x_{i, 1}\cdot u_{1, 3} + x_{i, 2}\cdot u_{2, 3} + x_{i, 3}\cdot u_{3, 3} + x_{i, 4}\cdot u_{4, 3}) \cdot x_{3,3} \\ + (x_{i, 1}\cdot u_{1, 4} + x_{i, 2}\cdot u_{2, 4} + x_{i, 3}\cdot u_{3, 4} + x_{i, 4}\cdot u_{4, 4}) \cdot x_{3,4} $$ suggesting that the right's derivative is,

$$\frac{\partial m_{:,3}}{x_{3,1}} = X \cdot u_{:,1}$$

So the derivative for entry $x_{3,1}$ is $$\frac {\partial M}{\partial x_{3,1}} = u_{1,:} \cdot X + X \cdot u_{:,1}$$, or, more generally, any element, $x_{i,j}$ is $$\frac {\partial M}{\partial x_{i,j}} = u_{j,:} \cdot X + X \cdot u_{:,j}$$

This led me to conclude that the desired derivative is, $$\frac {\partial M}{\partial X} = U \cdot X + X \cdot U^T$$

Verification with numerical auto-differentiators produce very different results. What am I missing?

2

There are 2 best solutions below

0
On

I will employ the Einstein summation convention, that is repeated indices are summed over. Now, we can write the $(i,j)$th component of $M$ as: \begin{align} M^i_j=X^i_kU^k_l(X^T)^l_j \end{align} Now, this is a sum of real numbers, if we interpret the $X^i_k$ as coordinates on the space of $n$ by $n$ matrices (which is fine, they form a vector space, and this is a basis), then we have that by the chain rule: \begin{align} \frac{\partial M^i_j}{\partial X^m_n}=&\frac{\partial X^i_k}{\partial X^m_n}U^k_l(X^T)^l_j+X^i_kU^k_l\frac{\partial (X^T)^l_j}{\partial X^m_n} \end{align} Now, we see that: \begin{align} \frac{\partial X^i_k}{\partial X^m_n}=\begin{cases} 1&\text{if }i=m, \text{ and } n=k\\ 0&\text{otherwise} \end{cases} \end{align} and: \begin{align} \frac{\partial (X^T)^l_j}{\partial X^m_n}=\begin{cases} 1&\text{if }j=m, \text{ and } n=l\\ 0&\text{otherwise} \end{cases} \end{align} hence: \begin{align} \frac{\partial M^i_j}{\partial X^m_n}=&\delta^i_m\delta_k^nU^k_l(X^T)^l_j +X^i_kU^k_l\delta^l_n\delta_j^m\\ =&\delta^i_mU^n_l(X^T)^l_j+X^i_kU^k_n\delta^m_j\\ =&\delta^i_m(U\cdot X^T)^n_j+\delta^m_j(X\cdot U)^i_n \end{align} Now, if we let $e_i$ be a basis for $\mathbb R^n$ and $e^i$ it's dual, we can write any linear endomorphism $A$ as the sum: $$A=A^i_je_i\otimes e^j$$ So we see that: \begin{align} \frac{\partial M}{\partial X^m_n}=&\frac{\partial M^i_j}{\partial X^m_n}e_i\otimes e^j\\ =&\delta^i_m(U\cdot X^T)^n_j e_i\otimes e^j+\delta^m_j(X\cdot U)^i_n e_i\otimes e^j\\ =&(U\cdot X^T)^n_je_m\otimes e^j +(X\cdot U)^i_n e_i\otimes e^m \end{align} i.e. $\partial M/\partial X^i_j$ is the sum of two matrices, where the first is all zeros, except in the $m$th column, and the $j$th row of the $m$th column is the $(n,j)$the component of $U\cdot X^T$, and teh second is all zeros except in the $m$th row, where the $i$th column of the $m$th row is given by the $(i,n)$th component of $X\cdot U$.

How one would want to organize in this into a Jacobian matrix is not clear to me, I would probably write it as a matrix of one forms indexed by $dX^m_n$, which is the linear functional on $\mathbb R^{n^2}$ which is identically zero on every vector that does not $e_m\otimes e^n$ component, and takes $e_m\otimes e^n$ to $1$.

1
On

$ \newcommand\dd{\mathrm d} \newcommand\R{\mathbb R} \newcommand\vecop{\mathop{\mathrm{vec}}} $The total differential of $M : \R^{4,4} \to \R^{4,4}$ at $X$ is a linear map $\dd M_X : \R^{4,4} \to \R^{4,4}$. For any function $F$ we also write $\dot\dd[F(\dot X)]$ to mean $\dd F_X$. If $F(X) = G(X,X)$ for any $G$, then the following is always true: $$ \dot\dd[G(\dot X,\dot X)] = \dot\dd[G(\dot X,X)] + \dot\dd[G(X,\dot X)]. $$ In less transparent notation with $L(X) = G(X,Y)$ and $R(X) = G(Y,X)$ for arbitrary $Y$ independent from $X$ $$ \dd F_X = [\dd L_X + \dd R_X]_{Y=X}. $$ In other words, after taking the differentials we substitute $X$ for $Y$.

Finally, for any function $F$ which is linear we have $$ \dd F_X = F. $$

These two rules allow us to easily compute $$ \dd M_X = \dot\dd[\dot XU\dot X^T] (H) = \dot\dd[\dot XUX^T] (H) + \dot\dd[XU\dot X^T] (H) = HUX^T + XUH^T $$ the last equality following because the functions $X \mapsto XUY^T$ and $X \mapsto YUX^T$ are linear.

If we want to express this differential as some matrix, then we need to choose a way of representing $H \in \R^{4,4}$ as a vector in $\R^{16}$. Suppose we stack the columns of $H$ to get the vectorization $\vecop(H)$ of $H$. This operator satisfies a useful identity with the Kronecker product $\otimes$: $$ \vecop(AHB) = (B^T\otimes A)\vecop(H). $$ Then writing $$ \dd M_X = IHUX^T + (IHU^TX^T)^T $$ with $I$ the $4{\times}4$ identity, we see that $\dd M_X$ can be represented by the matrix $$ (XU^T)\otimes I + T[(XU)\otimes I)],\quad T = \begin{pmatrix} J_{1,1} & J_{2,1} & J_{3,1} & J_{4,1} \\\\ J_{1,2} & J_{2,2} & J_{3,2} & J_{4,2} \\\\ J_{1,3} & J_{2,3} & J_{3,3} & J_{4,3} \\\\ J_{1,4} & J_{2,4} & J_{3,4} & J_{4,4} \end{pmatrix} $$ where $J_{i,j}$ is the $4{\times}4$ matrix with $1$ in the $(i,j)$ entry and $0$ everywhere else; the matrix $T$ represents the $\R^{4,4}$ transpose operation in $\R^{16}$.