I am trying to differentiate the matrix product of the quadratic form, $M=XUX^T$, with respect to the matrix, $X$. That is, I want $\frac{\partial M}{\partial X}$. Working from a concrete example with matrices, $M,X,U \in \mathbb{R}^{4,4}$, and a particular element, $x_{3,1}$, my plan was to find the general form of the derivative of an element which would lead to the matrix form. A single element in $X$ appears 8x in the computation of the product, $M$; 4 on the left, and 4 on the right. My plan was to compute the left and right derivatives then add them. Examining a single computation (transpose accounted for in the subscripts, 1-indexing),
$$ m_{3,2} = \\ (x_{3, 1}\cdot u_{1, 1} + x_{3, 2}\cdot u_{2, 1} + x_{3, 3}\cdot u_{3, 1} + x_{3, 4}\cdot u_{4, 1}) \cdot x_{2,1} \\ + (x_{3, 1}\cdot u_{1, 2} + x_{3, 2}\cdot u_{2, 2} + x_{3, 3}\cdot u_{3, 2} + x_{3, 4}\cdot u_{4, 2}) \cdot x_{2,2} \\ + (x_{3, 1}\cdot u_{1, 3} + x_{3, 2}\cdot u_{2, 3} + x_{3, 3}\cdot u_{3, 3} + x_{3, 4}\cdot u_{4, 3}) \cdot x_{2,3} \\ + (x_{3, 1}\cdot u_{1, 4} + x_{3, 2}\cdot u_{2, 4} + x_{3, 3}\cdot u_{3, 4} + x_{3, 4}\cdot u_{4, 4}) \cdot x_{2,4} $$
then $$ \frac {\partial m_{3,2} } {\partial x_{3,1}} = u_{1, 1} \cdot x_{2,1} + u_{1, 2} \cdot x_{2,2} + u_{1, 3} \cdot x_{2,3} + u_{1, 4} \cdot x_{2,4} = u_{1, :} \cdot x_{2,:}^T $$
Each entry in $X$ will be used 4 times on the left side, so differentiating the left side in all 4 places that it appears gives (the transposes cancelling out),
$$\frac {\partial m_{3,:}}{\partial x_{3,1}} = u_{1,:}\cdot X$$ $x_{3,1}$ will also appear 4 times on the right side (accounting for the transpose): $$m_{i,3} = \\ (x_{i, 1}\cdot u_{1, 1} + x_{i, 2}\cdot u_{2, 1} + x_{i, 3}\cdot u_{3, 1} + x_{i, 4}\cdot u_{4, 1}) \cdot x_{3,1} \\ + (x_{i, 1}\cdot u_{1, 2} + x_{i, 2}\cdot u_{2, 2} + x_{i, 3}\cdot u_{3, 2} + x_{i, 4}\cdot u_{4, 2}) \cdot x_{3,2} \\ + (x_{i, 1}\cdot u_{1, 3} + x_{i, 2}\cdot u_{2, 3} + x_{i, 3}\cdot u_{3, 3} + x_{i, 4}\cdot u_{4, 3}) \cdot x_{3,3} \\ + (x_{i, 1}\cdot u_{1, 4} + x_{i, 2}\cdot u_{2, 4} + x_{i, 3}\cdot u_{3, 4} + x_{i, 4}\cdot u_{4, 4}) \cdot x_{3,4} $$ suggesting that the right's derivative is,
$$\frac{\partial m_{:,3}}{x_{3,1}} = X \cdot u_{:,1}$$
So the derivative for entry $x_{3,1}$ is $$\frac {\partial M}{\partial x_{3,1}} = u_{1,:} \cdot X + X \cdot u_{:,1}$$, or, more generally, any element, $x_{i,j}$ is $$\frac {\partial M}{\partial x_{i,j}} = u_{j,:} \cdot X + X \cdot u_{:,j}$$
This led me to conclude that the desired derivative is, $$\frac {\partial M}{\partial X} = U \cdot X + X \cdot U^T$$
Verification with numerical auto-differentiators produce very different results. What am I missing?
I will employ the Einstein summation convention, that is repeated indices are summed over. Now, we can write the $(i,j)$th component of $M$ as: \begin{align} M^i_j=X^i_kU^k_l(X^T)^l_j \end{align} Now, this is a sum of real numbers, if we interpret the $X^i_k$ as coordinates on the space of $n$ by $n$ matrices (which is fine, they form a vector space, and this is a basis), then we have that by the chain rule: \begin{align} \frac{\partial M^i_j}{\partial X^m_n}=&\frac{\partial X^i_k}{\partial X^m_n}U^k_l(X^T)^l_j+X^i_kU^k_l\frac{\partial (X^T)^l_j}{\partial X^m_n} \end{align} Now, we see that: \begin{align} \frac{\partial X^i_k}{\partial X^m_n}=\begin{cases} 1&\text{if }i=m, \text{ and } n=k\\ 0&\text{otherwise} \end{cases} \end{align} and: \begin{align} \frac{\partial (X^T)^l_j}{\partial X^m_n}=\begin{cases} 1&\text{if }j=m, \text{ and } n=l\\ 0&\text{otherwise} \end{cases} \end{align} hence: \begin{align} \frac{\partial M^i_j}{\partial X^m_n}=&\delta^i_m\delta_k^nU^k_l(X^T)^l_j +X^i_kU^k_l\delta^l_n\delta_j^m\\ =&\delta^i_mU^n_l(X^T)^l_j+X^i_kU^k_n\delta^m_j\\ =&\delta^i_m(U\cdot X^T)^n_j+\delta^m_j(X\cdot U)^i_n \end{align} Now, if we let $e_i$ be a basis for $\mathbb R^n$ and $e^i$ it's dual, we can write any linear endomorphism $A$ as the sum: $$A=A^i_je_i\otimes e^j$$ So we see that: \begin{align} \frac{\partial M}{\partial X^m_n}=&\frac{\partial M^i_j}{\partial X^m_n}e_i\otimes e^j\\ =&\delta^i_m(U\cdot X^T)^n_j e_i\otimes e^j+\delta^m_j(X\cdot U)^i_n e_i\otimes e^j\\ =&(U\cdot X^T)^n_je_m\otimes e^j +(X\cdot U)^i_n e_i\otimes e^m \end{align} i.e. $\partial M/\partial X^i_j$ is the sum of two matrices, where the first is all zeros, except in the $m$th column, and the $j$th row of the $m$th column is the $(n,j)$the component of $U\cdot X^T$, and teh second is all zeros except in the $m$th row, where the $i$th column of the $m$th row is given by the $(i,n)$th component of $X\cdot U$.
How one would want to organize in this into a Jacobian matrix is not clear to me, I would probably write it as a matrix of one forms indexed by $dX^m_n$, which is the linear functional on $\mathbb R^{n^2}$ which is identically zero on every vector that does not $e_m\otimes e^n$ component, and takes $e_m\otimes e^n$ to $1$.