$x, a$ in $\mathbb R^n$, $A$ in $\mathbb R^{n\times n}$. Compute $d(x^T a)/dx$ and $d(x^T A x)/dx$.
I'm not sure about how to think about these and how to do these. Can someone explain how to derive the expressions for the two?
Finally, what happens when we have $A$ and $X$, BOTH in $\mathbb R^{n\times n}$, and we want to find $dTrace(XA)/dX$?
Let's use the convention that members of $\mathbb{R}^{n}$ are column vectors. Recall that $$ x^{T}a=\sum_{i=1}^{n}x_{i}a_{i} $$ and for a scalar $c$, $$ \frac{dc}{dx}\equiv\left(\begin{array}{c} \frac{dc}{dx_{1}}\\ \frac{dc}{dx_{2}}\\ \vdots\\ \frac{dc}{dx_{n}} \end{array}\right). $$ Therefore, $$ \frac{d\left(x^{T}a\right)}{dx}=\left(\begin{array}{c} a_{1}\\ a_{2}\\ \vdots\\ a_{n} \end{array}\right)=a. $$ You can follow the same arguments to get $$ \frac{d\left(x^{T}Ax\right)}{dx}=2Ax. $$ See a list of identities at https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities.