How to apply chain rule involving matrices in this situation?

82 Views Asked by At

So I have a function that takes matrices as inputs and returns a scalar. So the function looks something like this:

$f(A*B)$

Where A and B are two square matrices of same shape.

Now I want to calculate

$$\nabla_A \mbox{f}(A*B)$$

And here I know the gradient when $B$ is independent of $A$ but if let's say $B$ were to depend on $A$ then what will the expression look like using the chain rule.

Let's say if $B$ is independent of $A$ then the result of the gradient is $G$ (Which is also a matrix) so will the expression look something like this, I am not quite sure if this is correct though.

$G*$$(\nabla_A \mbox{B})$

I have no idea if this is correct, or how do I proceed?

2

There are 2 best solutions below

0
On

Let $\mathbb{R}^{n\times n}$ the set of matrices of $n\times n$ order with coefficients in $\mathbb{R}$. Let $A\cdot B$ the product of matrices $A\in\mathbb{R}^{n\times n}$ and $B\in\mathbb{R}^{n\times n}$. Suppose that $f:\mathbb{R}^{n\times n}\to \mathbb{R}$ is derivable, i.e. there exists a linear transformation $$ \begin{array}{rrl} D f(A):&\hspace{-4mm}\mathbb{R}^{n\times n}&\hspace{-4mm}\to \mathbb{R}\\ &V &\hspace{-4mm}\mapsto \nabla f(A)\cdot V\\ \end{array} $$ such that $$ \lim_{\|V\|_{\mathbb{R}^{n\times n} \to 0}} \frac{| f(A+V)-f(A)-\nabla f(A)V |_{\mathbb{R}} }{\|V\|_{\mathbb{R}^{n\times n}}} =0 $$ Here the matrix norm $\|V\|_{\mathbb{R}^{n\times n}}$ is equal to $\sqrt[\,2]{\sum_{j=1}^n\sum_{i=1}^nV_{ij}^2}$. Then the derivative of function $$ \mathbb{R}^{n\times n}\ni A\longmapsto f(A\cdot B)\in\mathbb{R} $$ is the linear transformation $$ \nabla f(A\cdot B )\cdot (V\cdot B) $$ by cause $$ \lim_{\|V\|_{\mathbb{R}^{n\times n} \to 0}} \frac{| f(A\cdot B+V\cdot B)-f(A\cdot B)-\nabla f(A\cdot B)\cdot (V\cdot B) |_{\mathbb{R}} }{\|V\cdot B\|_{\mathbb{R}^{n\times n}}} =0 $$

0
On

Let $$\eqalign{ A &= X \cr B &= I \cr }$$ You say that you know the gradient in this case $$\eqalign{ G &= \nabla_X f(X) \cr }$$ Now substitute $X=AB$, and find the differential in this case. $$\eqalign{ df &= G:dX \cr &= G:(A\,dB+dA\,B) \cr &= A^TG:dB + GB^T:dA \cr }$$ where a colon represent the trace/Frobenius product, i.e. $$A:B = {\rm tr}(A^TB)$$ To proceed any further, we need to know the nature of the dependence of $B$ on $A$. Let's assume that we know the following $$dB = {\mathcal H}:dA$$ where ${\mathcal H}$ is a 4th order tensor, whose components are $$\eqalign{ {\mathcal H}_{ijkl} &= \frac{\partial B_{ij}}{\partial A_{kl}} }$$ Substitute this dependence into the differential to find the gradient $$\eqalign{ df &= A^TG:{\mathcal H}:dA + GB^T:dA \cr \nabla_A f &= A^TG:{\mathcal H} + GB^T \cr\cr }$$ In most cases, it's easier stick to differentials and avoid dealing with the 4th order tensor.

For example, let's assume $$B=\mu A^2$$ then $$dB=\mu\,dA\,A+\mu A\,dA$$ Substituting this into the differential expression lead to $$\eqalign{ df &= A^TG:dB + GB^T:dA \cr &= A^TG:(\mu\,dA\,A+\mu A\,dA) + G(\mu A^2)^T:dA \cr &= \mu A^TGA^T:dA + \mu A^TA^TG:dA + \mu GA^TA^T:dA \cr \nabla_A f &= \mu\,(A^TGA^T + A^TA^TG + GA^TA^T) \cr }$$