A doubt on the general chain rule to derive scalar functions w.r.t. matrices

68 Views Asked by At

From what I could understand reading some of the answers here, and reading some pdfs on matrix derivation, the general rule for scalar-to-matrix derivation is:

Let $g(X)=U$.

$$\frac{d}{d X}f(g(X))=\frac{d}{d X_{ij}}f(g(X)) = \sum_{k}\sum_l \frac{\partial}{\partial U_{kl}}f(U)\frac{\partial}{\partial X_{ij}}U_{kl}=Tr\left(\left(\frac{\partial}{\partial U}f(U)\right)^\intercal \frac{\partial}{\partial X_{ij}}U\right)$$

However, the differential notation is usually more used. And the differential formula I've seen being used is if $$df=Tr\left(\left(A \right)^\intercal dX \right)$$ then $$\frac{d}{d X}f(g(X))=A$$

How does one reconcile both notations?

1

There are 1 best solutions below

1
On BEST ANSWER

In the first case, you've simply written $$\eqalign{ \frac{\partial f}{\partial X} &= \frac{\partial f}{\partial U}:\frac{\partial U}{\partial X} \cr }$$ In the second case, you've stated the definition of the differential in terms of the gradient $$\eqalign{ df &= A:dX \cr &= \Big(\frac{\partial f}{\partial X}\Big):dX \cr &= \Big(\frac{\partial f}{\partial U}:\frac{\partial U}{\partial X}\Big):dX \cr }$$

I'm not sure what needs to be reconciled; the two cases are consistent with one another.
Note however that $\frac{\partial U}{\partial X}$ is a $4^{th}$ order tensor, which will be tricky to work with.
*[Instead of the functional notation ${\,\rm Tr}\big(A^T\,dX\big)\,$ I've used the product notation $\big(A:dX\big)$