(ANSML is a tag I would like to use for Andrew Ng's Stanford Machine Learning - 2008)
In this course, there were four matrix identities that I would like to prove. \begin{align} \nabla_a \text{tr}AB &= B^T\\ \nabla_{A^T}f(A) &= (\nabla_Af(A))^T\\ \nabla_A\text{tr}ABA^TC &= CAB+C^TAB^T\\ \nabla_A |A| &= |A| (A^{-1})^T \end{align} where $A$ is an $m\times n$ matrix and $B$ is an $n \times m$ matrix, and $f:\mathbb{R}^{m \times n} \mapsto \mathbb{R}$ such that $f(A) = \text{tr}AB$.
So (to show I did my homework) I was able to prove the first two equations by using the fact that \begin{align} f(A) = \text{tr}AB &=\sum_{i=1}^nA_{1i}B_{i1} +\sum_{i=1}^nA_{2i}B_{i2} + \cdots + \sum_{i=1}^nA_{mi}B_{m1} \\ &= \sum_{j=1}^{m}\sum_{i=1}^{n} A_{ji}B_{ij}\\ \end{align}
and so for the first equation,
\begin{align}
\nabla_a \text{tr}AB &= \begin{bmatrix}\frac{\partial f}{\partial A_{11}}&\frac{\partial f}{\partial A_{12}} & \cdots &
\frac{\partial f}{\partial A_{1n}}\\
\frac{\partial f}{\partial A_{21}} & \frac{\partial f}{\partial A_{22}} & \cdots & \frac{\partial f}{\partial A_{2n}}\\
\vdots&\vdots&\ddots&\vdots\\
\frac{\partial f}{\partial A_{m1}}&\frac{\partial f}{\partial A_{m2}} & \cdots & \frac{\partial f}{\partial A_{mn}}\end{bmatrix}\\&=B^T
\end{align}
by just doing the partial differentiation.
My question is:
- How do I start proving the third equation?
- How do I start proving the fourth equation?
Note that these eventually lead to the normal equations of the LMS for the gradient descent algorithms. (oh, please also guide me on whether this is classified correctly under the correct tags. I am very new to this topic).
There is no need to expand everything and take derivatives, we can prove the third identity using only the product rule along with part $1$.
Specifically, the product rule states that
$$\nabla_A \text{tr}(ABA^tC)=\nabla_A \text{tr}(ABXC)|_{X=A^t}+\nabla_A \text{tr}(YBA^tC)|_{Y=A},$$ and we see that the first term equals $C^tAB^t$ by the first identity. The second term may be evaluated by using trace properties to write $$\nabla_A \text{tr}(YBA^tC)|_{Y=A}=\nabla_A \text{tr}(AB^tY^tC^t)|_{Y=A}$$ which equals $CAB$ by the first identity, proving the desired result.