Please explain $\frac{\partial}{\partial a_{ij}} \sum_{i=1}^m a_{1i} b_{i1} + \cdots + \sum_{i=1}^m a_{ni} b_{in} = b_{ji}$

44 Views Asked by At

I need a detailed step by step to understand please. This is one step from the broader proof 1 of $\nabla_A \mathrm{tr} AB=B^T$ whose preceding steps to this point I understand. This is a totally new area for me so please be explicit in detail.

The exact thing, I think, that I am struggling with in this and similar proofs is the action of the derivative on indices. I know the product rule and other rules of differential calculus pretty well, but I've not done them involving indices or at least at that level of detail. Why $b_{ji}$, the transpose yes, but how did the indices end up that way at the end? What exact process produced the indices? It is actually hard to find resources that work through the tedious calculations (I suspect) that are required to truly follow this.

Thanks in advance!

2

There are 2 best solutions below

6
On

Let $$f_k(a_{11},a_{12},\ldots,a_{nm},b_{11},b_{12},\ldots,b_{mn}):=\sum_{l=1}^m a_{kl}b_{lk}$$ so that $$\frac{\partial}{\partial a_{ij}}\sum_{k=1}^n f_k=\sum_{k=1}^n \frac{\partial}{\partial a_{ij}}f_k$$ is the derivative in question. Then note that $f_k$ does not depend on $a_{ij}$ if $k\neq i$ so we have $$\sum_{k=1}^n \frac{\partial}{\partial a_{ij}}f_k=\frac{\partial}{\partial a_{ij}}f_i.$$ However, this is simply given by \begin{align*} &\frac{\partial}{\partial a_{ij}}f_i(a_{11},a_{12},\ldots,a_{nm},b_{11},b_{12},\ldots,b_{mn})=\frac{\partial}{\partial a_{ij}}\sum_{l=1}^m a_{il}b_{li} =\sum_{l=1}^m \frac{\partial}{\partial a_{ij}}a_{il}b_{li}\\=&\frac{\partial}{\partial a_{ij}}a_{ij}b_{ji}=b_{ji}. \end{align*}

0
On

Using Einstein's notation, we have that $$\operatorname{Tr}(AB)=A_{ij}B_{ji}$$ Which means that $$\frac{\partial}{\partial A_{pq}} A_{ij}B_{ji}=\delta_{ip}\delta_{jq}B_{ji}=B_{qp}$$ This happens because $$\frac{\partial A_{ij}}{\partial A_{pq}}=\delta_{ip}\delta_{jq}$$ The same thing happens in the case of $\mathbb{R}^n \mapsto \mathbb{R}$ functions. For example, let $f(x,y,z)=x^2+y^2+z$. Then we have that $\partial_x f = 2x$, $\partial_y f=2y$ and $\partial_z f=1$. It's the same, but instead of different letters, we have indices. (It's like using $x_1$, $x_2$ and $x_3$ instead of $x$, $s$ and $z$)