Computing matrix-vector calculus derivatives

183 Views Asked by At

$x, a$ in $\mathbb R^n$, $A$ in $\mathbb R^{n\times n}$. Compute $d(x^T a)/dx$ and $d(x^T A x)/dx$.

I'm not sure about how to think about these and how to do these. Can someone explain how to derive the expressions for the two?

Finally, what happens when we have $A$ and $X$, BOTH in $\mathbb R^{n\times n}$, and we want to find $dTrace(XA)/dX$?

2

There are 2 best solutions below

1
On

Let's use the convention that members of $\mathbb{R}^{n}$ are column vectors. Recall that $$ x^{T}a=\sum_{i=1}^{n}x_{i}a_{i} $$ and for a scalar $c$, $$ \frac{dc}{dx}\equiv\left(\begin{array}{c} \frac{dc}{dx_{1}}\\ \frac{dc}{dx_{2}}\\ \vdots\\ \frac{dc}{dx_{n}} \end{array}\right). $$ Therefore, $$ \frac{d\left(x^{T}a\right)}{dx}=\left(\begin{array}{c} a_{1}\\ a_{2}\\ \vdots\\ a_{n} \end{array}\right)=a. $$ You can follow the same arguments to get $$ \frac{d\left(x^{T}Ax\right)}{dx}=2Ax. $$ See a list of identities at https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities.

9
On

Note that $x^ta=a^tx$ and so $$ \frac{d(x^tA)}{dx}=\frac{d}{dh}a^t(x+h)|_{h=0}=a^t. $$ Similarly, since $h^tAx=x^tA^th$, $$ \frac{d(x^tAx)}{dx}=\frac{d}{dh}(x+h)^tA(x+h)|_{h=0}=\frac{d}{dh}(x^tAh+h^tAx)|_{h=0}=x^t(A+A^t). $$ Finally, since $$ tr(XA)=\sum_{i=1}^n\sum_{j=1}^n X_{ij}A_{ji} $$ we have $$ \frac{d tr(XA)}{d X_{ij}}=A_{ji}, $$ which gives $$ \frac{d tr(XA)}{d X}=A^t. $$ All this must be interpreted properly if used to make other computations!