What is the correct chain rule for composite matrix functions?

573 Views Asked by Autonomous At 13 Apr 2025 - 11:21

I present two versions of chain rule for composite matrix functions. To me, their statements are similar, however the results seem different.

From Convex optimization and Euclidean Geometry book, Appendix A, page 7 of the document

Given dimensioanlly comaptible matrix valued functions of matrix variable $f(\mathbf{X})$ and $g(\mathbf{X})$

$\ \ \ \ \ \ \ \Delta_X \ g(f(\mathbf{X})^T)=\Delta_X \ f^T \ \Delta_X \ g$

Now, from the Matrix Cookbook, page 15 of the document,

Let $\mathbf{U} = f(\mathbf{X})$, the goal is to find the derivative of the function $g(\mathbf{U})$ with respect to $\mathbf{X}$:

$\ \ \ \ \ \ \ \ \ \frac{\partial g(f(\mathbf{X}))}{\partial X_{ij}}= Tr ((\frac{\partial g(\mathbf{U})}{\partial \mathbf{U}})^T \frac{\partial g(\mathbf{U})}{\partial X_{ij}})$

Now I understand that the second result gives only one element in the derivative matrix whereas the first equation is the entire derivative matrix. However, I can't see a trivial relation between the first and the second equation. Are they two equal?

Original Q&A

What is the correct chain rule for composite matrix functions?

Related Questions in MATRICES

Related Questions in PARTIAL-DERIVATIVE

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions