How to apply chain rule involving matrices in this situation?

Question

How to apply chain rule involving matrices in this situation?

82 Views Asked by Bumbble Comm At 04 Apr 2026 - 1:40

So I have a function that takes matrices as inputs and returns a scalar. So the function looks something like this:

$f(A*B)$

Where A and B are two square matrices of same shape.

Now I want to calculate

$$\nabla_A \mbox{f}(A*B)$$

And here I know the gradient when $B$ is independent of $A$ but if let's say $B$ were to depend on $A$ then what will the expression look like using the chain rule.

Let's say if $B$ is independent of $A$ then the result of the gradient is $G$ (Which is also a matrix) so will the expression look something like this, I am not quite sure if this is correct though.

$G*$$(\nabla_A \mbox{B})$

I have no idea if this is correct, or how do I proceed?

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2017-12-21 15:01:38

Let $\mathbb{R}^{n\times n}$ the set of matrices of $n\times n$ order with coefficients in $\mathbb{R}$. Let $A\cdot B$ the product of matrices $A\in\mathbb{R}^{n\times n}$ and $B\in\mathbb{R}^{n\times n}$. Suppose that $f:\mathbb{R}^{n\times n}\to \mathbb{R}$ is derivable, i.e. there exists a linear transformation $$ \begin{array}{rrl} D f(A):&\hspace{-4mm}\mathbb{R}^{n\times n}&\hspace{-4mm}\to \mathbb{R}\\ &V &\hspace{-4mm}\mapsto \nabla f(A)\cdot V\\ \end{array} $$ such that $$ \lim_{\|V\|_{\mathbb{R}^{n\times n} \to 0}} \frac{| f(A+V)-f(A)-\nabla f(A)V |_{\mathbb{R}} }{\|V\|_{\mathbb{R}^{n\times n}}} =0 $$ Here the matrix norm $\|V\|_{\mathbb{R}^{n\times n}}$ is equal to $\sqrt[\,2]{\sum_{j=1}^n\sum_{i=1}^nV_{ij}^2}$. Then the derivative of function $$ \mathbb{R}^{n\times n}\ni A\longmapsto f(A\cdot B)\in\mathbb{R} $$ is the linear transformation $$ \nabla f(A\cdot B )\cdot (V\cdot B) $$ by cause $$ \lim_{\|V\|_{\mathbb{R}^{n\times n} \to 0}} \frac{| f(A\cdot B+V\cdot B)-f(A\cdot B)-\nabla f(A\cdot B)\cdot (V\cdot B) |_{\mathbb{R}} }{\|V\cdot B\|_{\mathbb{R}^{n\times n}}} =0 $$

**Bumbble Comm** · Answer 2 · 2017-12-21 15:05:28

Let $$\eqalign{ A &= X \cr B &= I \cr }$$ You say that you know the gradient in this case $$\eqalign{ G &= \nabla_X f(X) \cr }$$ Now substitute $X=AB$, and find the differential in this case. $$\eqalign{ df &= G:dX \cr &= G:(A\,dB+dA\,B) \cr &= A^TG:dB + GB^T:dA \cr }$$ where a colon represent the trace/Frobenius product, i.e. $$A:B = {\rm tr}(A^TB)$$ To proceed any further, we need to know the nature of the dependence of $B$ on $A$. Let's assume that we know the following $$dB = {\mathcal H}:dA$$ where ${\mathcal H}$ is a 4th order tensor, whose components are $$\eqalign{ {\mathcal H}_{ijkl} &= \frac{\partial B_{ij}}{\partial A_{kl}} }$$ Substitute this dependence into the differential to find the gradient $$\eqalign{ df &= A^TG:{\mathcal H}:dA + GB^T:dA \cr \nabla_A f &= A^TG:{\mathcal H} + GB^T \cr\cr }$$ In most cases, it's easier stick to differentials and avoid dealing with the 4th order tensor.

For example, let's assume $$B=\mu A^2$$ then $$dB=\mu\,dA\,A+\mu A\,dA$$ Substituting this into the differential expression lead to $$\eqalign{ df &= A^TG:dB + GB^T:dA \cr &= A^TG:(\mu\,dA\,A+\mu A\,dA) + G(\mu A^2)^T:dA \cr &= \mu A^TGA^T:dA + \mu A^TA^TG:dA + \mu GA^TA^T:dA \cr \nabla_A f &= \mu\,(A^TGA^T + A^TA^TG + GA^TA^T) \cr }$$

How to apply chain rule involving matrices in this situation?

There are 2 best solutions below

Related Questions in CALCULUS

Related Questions in MATRICES

Related Questions in MATRIX-CALCULUS

Related Questions in CHAIN-RULE

Trending Questions

Popular # Hahtags

Popular Questions