Applying the chain rule to a function mapping matrices to matrices

207 Views Asked by At

Given a function $F:\text{Mat}(n,n)\rightarrow\text{Mat}(n,n)$, I'd like to get $\frac{d}{dM_{ij}}F(M)^2$ using matrix calculus.

I already derived it by simply writing it out explicitly for my choice of $F$, but this doesn't seem to generalize well if I want to take second derivatives of higher powers of $F(M)$ so I wanted to finally get my head around matrix calculus.

To get it in matrix notation I noted that $$\frac{d}{dM_{ij}}F(M)^2 = \left(\frac{d}{dM} F(M)^2\right)[e^{ij}]$$ for $e^{ij}\in\text{Mat}(n,n)$ such that $(e^{ij})_{xy} = \delta_{i=x}\delta_{j=y}$.

Now if $F$ were the identity, then I'd get $Me^{ij}+e^{ij}M$ (or more generally $\sum_{l=0}^{p-1}M^le^{ij}M^{p-l-1}$ for $F(M)^p$ instead of $F(M)^2$). Using the chain rule, would I get $$\left(F(M)e^{ij}+e^{ij}F(M)\right)\left(\frac{d}{dM}F(M)[e^{ij}]\right) ?$$

Optional information:

It seems weird since I multiply(/pass as an argument) with $e^{ij}$ twice, which doesn't really make a lot of sense in the scalar case (e.g. $n=1$), but

  1. I don't know how to make use of the $(\frac{d}{dX} X^2)[Y]=XY+YX$ otherwise
  2. for symmetric $M$ and my choice of $F$ (which maps positive $M$ to positive $F(M)$) the trace of this object coincides with my direct computation (not sure if that's just a coincidence caused by some of the involved symmetries though).

What I looked up so far: Most of the notes I could find online only dealt with scalar valued functions of matrices or vectors. http://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf (p15, eq 136) seems to be one exception, but they go back to pretty direct computations which is basically what I did so far. Instead I was thinking about getting something along the lines of Matrix chain rule question: what is $\frac{d}{dX} f(S)$ where $S = (A+X)^{-1}$ (just set $f=(F(M)^2)_{xy}$), but this doesn't seem to work if I don't know how to invert $F$.

2

There are 2 best solutions below

4
On

Let a colon (:) represent the double-contraction product, e.g. $$\eqalign{C &= A:B \cr C_{ijmn} &= A_{ijkl}\,B_{klmn}}$$ Also, let ${\mathcal E}$ denote the 4th order isotropic tensor with components $${\mathcal E}_{ijkl}=\delta_{ik}\,\delta_{jl}$$ I'll assume that you know how to calculate the function $F$ and its gradient ${\mathcal G}=\frac{\partial F}{\partial M}$.
Then the differential and gradient of your squared function are $$\eqalign{ S &= F^2 \cr dS &= dF\,F + F\,dF \cr &= ({\mathcal E}F^T+F{\mathcal E}):dF \cr &= ({\mathcal E}F^T+F{\mathcal E}):{\mathcal G}:dM \cr \frac{\partial S}{\partial M} &= ({\mathcal E}F^T+F{\mathcal E}):{\mathcal G} \cr }$$ If you are uncomfortable with tensors, then you can vectorize all of the terms, i.e. $s={\rm vec}(S), f={\rm vec}(F),$ etc, to obtain $$\eqalign{ {\rm vec}(dS) &= {\rm vec}(dF\,F + F\,dF) \cr ds &= (F^T\otimes I+I\otimes F)\,df \cr &= (F^T\otimes I+I\otimes F)\,G\,dm \cr \frac{\partial s}{\partial m} &= (F^T\otimes I+I\otimes F)\,G \cr }$$ where $G=\frac{\partial f}{\partial m}$ is a matrix, not a 4th order tensor like ${\mathcal G}$.

1
On

In index notation, with the Einstein summation convention, the solution is quite simple $$\eqalign{ \frac{\partial S_{pq}}{\partial M_{ij}} &= \frac{\partial(F_{pk}F_{kq})}{\partial M_{ij}} \cr &= \frac{(\partial F_{pk})F_{kq}}{\partial M_{ij}} + \frac{F_{pk}\,(\partial F_{kq})}{\partial M_{ij}} }$$