ANSML - Proving of the matrix identity $\nabla_AtrABA^TC = CAB+C^TAB^T$

339 Views Asked by At

(ANSML is a tag I would like to use for Andrew Ng's Stanford Machine Learning - 2008)

In this course, there were four matrix identities that I would like to prove. \begin{align} \nabla_a \text{tr}AB &= B^T\\ \nabla_{A^T}f(A) &= (\nabla_Af(A))^T\\ \nabla_A\text{tr}ABA^TC &= CAB+C^TAB^T\\ \nabla_A |A| &= |A| (A^{-1})^T \end{align} where $A$ is an $m\times n$ matrix and $B$ is an $n \times m$ matrix, and $f:\mathbb{R}^{m \times n} \mapsto \mathbb{R}$ such that $f(A) = \text{tr}AB$.

So (to show I did my homework) I was able to prove the first two equations by using the fact that \begin{align} f(A) = \text{tr}AB &=\sum_{i=1}^nA_{1i}B_{i1} +\sum_{i=1}^nA_{2i}B_{i2} + \cdots + \sum_{i=1}^nA_{mi}B_{m1} \\ &= \sum_{j=1}^{m}\sum_{i=1}^{n} A_{ji}B_{ij}\\ \end{align}

and so for the first equation, \begin{align} \nabla_a \text{tr}AB &= \begin{bmatrix}\frac{\partial f}{\partial A_{11}}&\frac{\partial f}{\partial A_{12}} & \cdots & \frac{\partial f}{\partial A_{1n}}\\ \frac{\partial f}{\partial A_{21}} & \frac{\partial f}{\partial A_{22}} & \cdots & \frac{\partial f}{\partial A_{2n}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial f}{\partial A_{m1}}&\frac{\partial f}{\partial A_{m2}} & \cdots & \frac{\partial f}{\partial A_{mn}}\end{bmatrix}\\&=B^T \end{align} by just doing the partial differentiation.
My question is:

  • How do I start proving the third equation?
  • How do I start proving the fourth equation?

Note that these eventually lead to the normal equations of the LMS for the gradient descent algorithms. (oh, please also guide me on whether this is classified correctly under the correct tags. I am very new to this topic).

2

There are 2 best solutions below

1
On

There is no need to expand everything and take derivatives, we can prove the third identity using only the product rule along with part $1$.

Specifically, the product rule states that

$$\nabla_A \text{tr}(ABA^tC)=\nabla_A \text{tr}(ABXC)|_{X=A^t}+\nabla_A \text{tr}(YBA^tC)|_{Y=A},$$ and we see that the first term equals $C^tAB^t$ by the first identity. The second term may be evaluated by using trace properties to write $$\nabla_A \text{tr}(YBA^tC)|_{Y=A}=\nabla_A \text{tr}(AB^tY^tC^t)|_{Y=A}$$ which equals $CAB$ by the first identity, proving the desired result.

0
On

The product notation (:) for the trace is quicker to type, e.g. $$A:B = {\rm tr}(A^TB) = {\rm tr}(AB^T)$$ In this notation, your 3rd function becomes $$\eqalign{ \phi &= CAB:A \cr d\phi &= C\,dA\,B:A + CAB:dA \cr &= dA:C^TAB^T + CAB:dA \cr &= (C^TAB^T + CAB):dA \cr \nabla_A\phi &= C^TAB^T + CAB \cr }$$ Your 4th function can be written in terms of the eigenvalues $$\eqalign{ \psi &= \det A = \lambda_1*\lambda_2*\ldots*\lambda_n \cr \log\psi &= \log\lambda_1+\log\lambda_2+\ldots+\log\lambda_n = {\rm tr}\log A + 2k\pi i \cr d\log\psi &= d\,{\rm tr}\log A \cr \frac{d\psi}{\psi} &= A^{-T}:dA \cr \nabla_A\psi &= \psi A^{-T} \cr }$$ Some of the eigenvalues could be complex, making the $\log$ function multi-valued. The $(2k\pi i)$ collects all of that ugliness into a single term, which (fortunately) vanishes under differentiation. $$\eqalign{\cr}$$