Why does $\frac{\partial a^TX a}{\partial X} = aa^T$?

191 Views Asked by At

$$\frac{\partial a^TX a}{\partial X} = \frac{\partial a^TX^T a}{\partial X} = aa^T \tag 1$$

I got (1) from the Matrix Cookbook. But I don't see how you derive it? Why isn't it $a^Ta$.

Assume that $a$ is an arbitrary vector of real numbers of length $n$ and $X$ is an arbitrary $n\times n$ matrix.

2

There are 2 best solutions below

0
On

Write out $a^TXa$ by component:

$$ a^TXa = \sum_{ij} a_ix_{ij}a_j $$

Realize that $\frac{\partial}{\partial X}$ is shorthand for the matrix composed of entries $\frac{\partial}{\partial x_{ij}}$. Therefore,

$$ \frac{\partial a^TXa}{\partial x_{ij}} = a_ia_j $$

Stack these into a matrix, and this is exactly $aa^T$. Why not $a^Ta$? Just look at the dimensions! :)

2
On

A more general result is as follows. Let $M:=\text{Mat}_{n\times n}(\mathbb{R})$. Define $f:M\to \mathbb{R}$ to be $f(X)=\text{trace}(AX)$ for some fixed $A \in M$. Then, $$\frac{\partial}{\partial X}\,f(X)=A^\top\,.$$ In this particular case, $A=aa^\top$, so $f(X)=\text{trace}\left(aa^\top X\right)=\text{trace}\left(a^\top Xa\right)=a^\top Xa$. Hence, $$\frac{\partial}{\partial X}\,\left(a^\top X a\right)=\left(aa^\top\right)^\top =aa^\top\,.$$