I present two versions of chain rule for composite matrix functions. To me, their statements are similar, however the results seem different.
From Convex optimization and Euclidean Geometry book, Appendix A, page 7 of the document
- Given dimensioanlly comaptible matrix valued functions of matrix variable $f(\mathbf{X})$ and $g(\mathbf{X})$
$\ \ \ \ \ \ \ \Delta_X \ g(f(\mathbf{X})^T)=\Delta_X \ f^T \ \Delta_X \ g$
Now, from the Matrix Cookbook, page 15 of the document,
Let $\mathbf{U} = f(\mathbf{X})$, the goal is to find the derivative of the function $g(\mathbf{U})$ with respect to $\mathbf{X}$:
$\ \ \ \ \ \ \ \ \ \frac{\partial g(f(\mathbf{X}))}{\partial X_{ij}}= Tr ((\frac{\partial g(\mathbf{U})}{\partial \mathbf{U}})^T \frac{\partial g(\mathbf{U})}{\partial X_{ij}})$
Now I understand that the second result gives only one element in the derivative matrix whereas the first equation is the entire derivative matrix. However, I can't see a trivial relation between the first and the second equation. Are they two equal?