$\nabla_C f(Cu)$ where $C$ is a matrix

99 Views Asked by At

Let $f: \mathbb{R}^n \to \mathbb{R}$. I am trying to find the \begin{align} \nabla_C f(Cu) \end{align} where $C \in \mathbb{R}^{n \times k}$ and $u\in \mathbb{R}^k$. In other words, we are trying to find the derivative with respect to the matrix. This question is about chain rule for matrix differentiation.

What I tried:

I found some links here but I couldn't really follow everything. It appears to suggest that \begin{align} \frac{ \partial f(Cu)}{\partial C_{ij}}={\rm tr} \left( \nabla f(Cu) \left( \frac{\partial Cu}{\partial C_{ij}} \right)^T \right) \end{align}

1

There are 1 best solutions below

4
On BEST ANSWER

$\def\p#1#2{\frac{\partial #1}{\partial #2}} \def\e{\eqalign}$ Define the $(x,g)$ vectors as $$\e{ x &= Cu \quad\implies\quad &dx = dC\,u \\ g &= \p{f}{x} &\big({\rm gradient\,of\,}f{\rm\,wrt\,}x\big) \\ }$$ and introduce a convenient product notation for the trace using a colon, i.e. $$A:B = {\rm Tr}(A^TB) \;\doteq\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; B:A$$ This product is also defined for vectors; just treat them as rectangular matrices.

Write the differential in terms of $x$, then change the independent variable from $x\to C$. $$\e{ df &= g:dx \\&= g:dC\,u \\&= gu^T:dC \\ \p{f}{C} &= gu^T &\big({\rm gradient\,of\,}f{\rm\,wrt\,}C\big) \\\\ }$$ As you have discovered, the difficulty with using the chain rule in matrix calculus is that the intermediate quantities are often third and fourth order tensors, which are difficult to work with.

The nice thing about the differential approach is that the differential of a matrix acts like a matrix. In particular, it follows all of the rules of matrix algebra.

And using the trace (aka the colon product) eliminates a whole category of transposition errors, which arise with other methodologies.