What is the derivative of $\mathbf{a}^TX^2\mathbf{a}$ with respect to the symmetric matrix $X$?

128 Views Asked by At

Given a constant vector $\mathbf{a}\in{\rm I\!R}^n$ and a real symmetric matrix $X\in{\rm I\!R}^{n\times n}$, what is the derivative of $\mathbf{a}^T X^2 \mathbf{a}$ with respect to $X$?


I tried a simple example using $n=2$, with the following vector and matrix : $$ \mathbf{a} = \begin{bmatrix}a\\b\end{bmatrix} \qquad X = \begin{bmatrix} x & z \\ z & y \end{bmatrix} $$

What we want to differentiate is $$ \mathbf{a}^T X^2 \mathbf{a} = a^2 (a^2 + z^2) + 2 ab (xz+yz) + b^2 (z^2 + y^2) $$

Which gives the following result $$ \frac{\partial \mathbf{a}^T X^2 \mathbf{a}}{\partial X} = \begin{bmatrix} \frac{\partial \mathbf{a}^T X^2 \mathbf{a}}{\partial x} & \frac{\partial \mathbf{a}^T X^2 \mathbf{a}}{\partial z} \\ \frac{\partial \mathbf{a}^T X^2 \mathbf{a}}{\partial z} & \frac{\partial \mathbf{a}^T X^2 \mathbf{a}}{\partial y} \end{bmatrix} = 2 \begin{bmatrix} a^2 x + abz & a^2z + abx + aby + b^2z \\ a^2z + abx + aby + b^2z & b^2 y + abz \end{bmatrix} $$ I tried to re-write this to end up with something meaningful, but I could only write

$$ \frac{\partial \mathbf{a}^T X^2 \mathbf{a}}{\partial X} = 2X\mathbf{a}\mathbf{a}^T + 2 \begin{bmatrix} 0 & a^2 z + ab y \\ abx + b^2z & 0 \end{bmatrix} $$ and I do not know what to do with the matrix on the right...


What I would like is a valid expression for any $n > 1$ involving only $X$ and $\mathbf{a}$.

2

There are 2 best solutions below

1
On

Hint

Fréchet derivative of $f(X) =\mathbf{a}^TX^2\mathbf{a}$ is given by

$$\partial_{X_0}f(h) = \mathbf{a}^TX_0 h\mathbf{a} + \mathbf{a}^Th X_0 \mathbf{a}$$

0
On

$\def\m#1{\left[\begin{array}{c}#1\end{array}\right]}\def\p#1#2{\frac{\partial #1}{\partial #2}}$Let $U$ be an unconstrained matrix and use a colon denote the trace function in product form, i.e. $$A:B = {\rm Tr}(A^TB) = B:A$$ Write the function using the colon product and calculate the unconstrained derivative. $$\eqalign{ \phi &= aa^T:U^2 \\ d\phi &= aa^T:(U\,dU+dU\,U) \\ &= (Uaa^T+aa^TU):dU \\ \p{\phi}{U} &= Uaa^T+aa^TU \;\doteq\; G \qquad({\rm gradient}) \\ }$$ Here is recipe for converting an unconstrained gradient $G$ into the desired form $$\eqalign{ G_S &\doteq G+G^T - {\rm Diag}(G) \\ &= (Xaa^T+aa^TX) + (Xaa^T+aa^TX)^T -{\rm Diag}(Xaa^T+aa^TX) \\ &= 2(Xaa^T+aa^TX) - {\rm Diag}(Xaa^T+aa^TX) \\ \\ }$$ Apply this general result to your $2\times 2$ example. $$\eqalign{ A = Xaa^T &= \m{a^2x+abz & abx+b^2z \\ a^2z+aby & abz+b^2y} \\ B = A+A^T &= \m{2(a^2x+abz) & (abx+b^2z+a^2z+aby) \\ (a^2z+aby+abx+b^2z) & 2(abz+b^2y)} \\ G_S = 2B-{\rm Diag}(B) &= \m{2(a^2x+abz)&2(abx+b^2z+a^2z+aby)\\2(a^2z+aby+abx+b^2z)&2(abz+b^2y)} \\ }$$ which is the same result that you obtained.

Having provided you with the formula that you were searching for, I must warn you that it is nonsense.

What you should do is extract a vector of fully independent parameters from the $X$ matrix using the half-vec operation $$\eqalign{ p &= {\rm vech}(X) = \m{x\\z\\y} \\ }$$ and solve whatever problem you have in mind in terms of this vector.

Everyone agrees that the following vector gradient is valid and unambiguous $$\eqalign{ g\doteq \p{\phi}{p} &= \m{2(a^2x+abz) \\ 2(abx+b^2z+a^2z+aby) \\ 2(abz+b^2y)} \\ }$$ However, casting this vector into matrix form using the reverse of the ${\rm vech}()$ function creates a thing which is difficult to interpret as a gradient, and hard to use (properly) in algorithms such as gradient descent.

Instead, you should leave the gradient in vector form and use it to optimize/solve for the $p$ vector. Then, as a post-processing step, you can cast the solution back into matrix form $$X = {\rm unvech}(p)$$