Derivatives Across Summations

411 Views Asked by At

So, I took one intro course in Tensor calculus and this problem reminds of that, except I can't quite recall how derivatives work with respect to components, or what those derivatives produce. Consider the following example:

enter image description here

The transition from (7) to (8) is where I'm lost. We have

$$ \frac{\partial f}{\partial X_{jk}} = \frac{\partial}{\partial X_{jk}} \sum_i \sum_j \sum_k A_{ij}X_{jk}B_{ki} $$

Now, the ultimate goal here is to find $\frac{\partial f}{\partial X}$, so why did we choose to differentiate on the indices $(j,k)$. Clearly that's a convenient choice, but why is it acceptable to choose, what is the intuition or motivation for that choice (this is my main question)? Given that it is the correct choice, I believe the derivative and subsequent work goes somewhat like:

\begin{align} \frac{\partial}{\partial X_{jk}} \sum_i \sum_j \sum_k A_{ij}X_{jk}B_{ki} &= \sum_i \sum_j \sum_k A_{ij}\frac{\partial}{\partial X_{jk}}\left(X_{jk}\right)B_{ki} \\ &= \sum_i \sum_j \sum_k A_{ij}1_{jk}B_{ki} \\ &= \sum_i \sum_j A_{ij} \sum_k 1_{jk}B_{ki} \tag{$\star$} \\ &= \sum_i \sum_j A_{ij}B{ji} \end{align}

But my steps in/around $(\star)$ are not legitimate. I sense that there should be some kronecker delta action around there. In tensor notation I get:

\begin{align} \frac{\partial}{\partial X_{jk}} A_{ij}X_{jk}B_{ki} &= A_{ij}\frac{\partial}{\partial X_{jk}}\left(X_{jk}\right)B_{ki} \\ &= A_{ij}B_{ki} \\ &= \left[BA\right]_{kj} \\ &= \left[BA\right]^T_{jk} \\ &= \left[A^TB^T\right]_{jk} \\ \end{align}

which are the components of $A^TB^T$. So that seems to work fine (am I correct?). If so, how do I reconcile the notations in order to perform the same calculation in index notation?

2

There are 2 best solutions below

0
On

I can see the confusion in the notation, so let's change things up a little bit to try and make it clearer.

First let's note what our coordinates are: For convenience, all indices range from from $1$ to $n$. We have coordinates of the form $(X_{ij})=(X_{ij}, 1\leq i,j\leq n)$. Then for fixed matrices $A$ and $B$, we have that $X_{ij}(A)=A_{ij}$ and $X_{ij}(B)=B_{ij}$. Then our function $f(X)$, in coordinates is written as $$f((X_{ij}))=\sum_i\sum_j\sum_kA_{ij}X_{jk}B_{ki}.$$ Now let's differentiate with respect to a fixed coordinate $X_{ab}$. Then \begin{align*} \frac{\partial f}{\partial X_{ab}}&=\frac{\partial}{\partial X_{ab}}\sum_i\sum_j\sum_kA_{ij}X_{jk}B_{ki}\\ &=\sum_i\sum_j\sum_kA_{ij}B_{ki}\frac{\partial X_{jk}}{\partial X_{ab}}\\ &=\sum_i\sum_j\sum_kA_{ij}B_{ki}\delta_{aj}\delta_{bk}\\ &=\sum_iA_{ia}B_{bi}\\ &=\sum_iB_{bi}A_{ia}\\ &=X_{ba}(BA)\\ &=X_{ab}((BA)^T)\\ &=X_{ab}(A^TB^T). \end{align*}

Now, $$\frac{\partial f}{\partial X_{ab}}=X_{ab}\left(\frac{\partial f}{\partial X}\right)$$ by definition, and hence $$\frac{\partial f}{\partial X}=A^TB^T.$$

2
On

Since you took a course covering tensors, you are undoubtedly familiar with the index summation convention. If we let $\partial_{ij}$ denote the derivative wrt $X_{ij}$, then the derivation quite succinct
$$\eqalign{ f &= A_{ij}X_{jk}B_{ki} \cr \partial_{mp}f &= A_{ij}\,(\partial_{mp}X_{jk})\,B_{ki} \cr &= A_{ij}\,(\delta_{mj}\delta_{pk})\,B_{ki} \cr &= A_{im}B_{pi} \cr &= A^T_{mi}B^T_{ip} \cr \frac{\partial f}{\partial X} &= A^TB^T \cr\cr }$$