Gradient of $X \mapsto a^T X b$ when $X$ is symmetric

3.6k Views Asked by At

For matrix $X \in \Bbb R^{n \times n}$, $a \in \Bbb R^n$, and $b \in \Bbb R^n$, I know the following holds

$$\nabla_X \left( a^T X b \right) = a{b^T}$$

However, it seems that if $X$ is a symmetric matrix ($X \in \Bbb S^n$), then

$$ \nabla_X \left( {a^T} X b \right) = \frac{1}{2}(a{b^T} + {b}a^T) $$

How to understand it? If $X \in \Bbb S^n$, then the dimension of $X$ is $\frac{n(n+1)}{2}$. Why should we get $n^2$ elements after differentiation?

1

There are 1 best solutions below

3
On

Define a non-standard symmetrizing operation for a square matrix ($A$) as $$ {\rm nsym}(A) = A + A^T - I\circ A $$ Now suppose that you have determined the differential of some scalar-valued function $f(X)$ to be $$ df = A:dX $$ Later you are told that $X$ is constrained to be symmetric. How does such a constraint modify the unconstrained result? The answer is to use nsym() $$ df = {\rm nsym}(A):dX $$ Applying this to the current problem yields $$ \frac{\partial f}{\partial X} = ab^T + ba^T - {\rm diag}(a\circ b) $$ A lot of people mistakenly apply the standard symmetrizing operator, i.e. $$ {\rm sym}(A) = \frac{1}{2}(A + A^T) $$ in this situation.

BTW, if the constraint is to be diagonal, then the symmetrizing operation to apply is $$ {\rm dsym}(A) = I\circ A $$