I want to get the derivative of a matrix function as follow: $$\frac{\partial f(\boldsymbol{AX})}{\partial \boldsymbol{X}}$$ which $f(\cdot)$ is a scalar function, and the result as I think should be the same shape as the matrix $\boldsymbol{X}$
How to get the derivative of a matrix function?
8.2k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Let $Y\!=\!AX$, so that $f\!=\!f(Y)$. I assume that you know how to calculate the derivative $\frac{\partial f}{\partial Y}$ and now wish to calculate $\frac{\partial f}{\partial X}$.
So write down the differential in terms of the Frobenius product (:) and switch the independent variable from $Y$ to $X$. $$\eqalign{ df &= \frac{\partial f}{\partial Y} : dY \cr &= \frac{\partial f}{\partial Y} : (AdX) \cr &= (A^T\frac{\partial f}{\partial Y}) : dX \cr\cr \frac{\partial f}{\partial X} &= A^T\frac{\partial f}{\partial Y} \cr }$$
If you do not know how to calculate $\frac{\partial f}{\partial Y}$ and want help with that, then you'll need to give us more information about the function.
If you are uncomfortable with the Frobenius product, you can replace it with the trace function, $\,\,A\!:\!B = {\rm tr}(A^T\!B)$.
Update
When a scalar function ($f$) is applied element-wise to a matrix argument ($Y$), the differential can be expressed in terms of the Hadamard ($\circ$) product as $$ \eqalign { df &= f'\circ dY \cr } $$ We can use the single-entry matrix $E_{ij}$ and the Frobenius (:) product to isolate a single element $$ \eqalign { df_{ij} &= E_{ij}:df \cr &= E_{ij}:f'\circ dY \cr &= E_{ij}\circ f': dY \cr } $$ Finally, the sigmoid function mentioned in the comments is interesting because the derivative is $f'=(f-f^2)$, which allows us to write $$ \eqalign { df_{ij} &= E_{ij}\circ(f-f^2) : dY \cr } $$
Since $df_{ij}=(\frac{\partial f_{ij}} {\partial Y}:dY)$, the derivative of this element with respect to $Y$ is $$ \eqalign { \frac{\partial f_{ij}} {\partial Y} &= E_{ij}\circ(f-f^2) \cr } $$ and with respect to $X$ it's $$ \eqalign { \frac{\partial f_{ij}} {\partial X} &= A^T\,\frac{\partial f_{ij}} {\partial Y} \cr } $$
It is basically the same as with vectors. The chain rule yields the total derivative (for any matrix $H$, having the same size as $X$) $$ D_X (f(AX)) [H] = f'(AX)[AH] = \langle \nabla f(AX), AH \rangle = \langle A^T \nabla f(AX), H\rangle. $$ Thus, the gradient is $A^T \nabla f(AX)$. Here, the inner product is given by $\langle X,Y \rangle = \operatorname{trace}(X^TY)$, and the gradient is the matrix of partial derivatives ordered as $X$.