So, I took one intro course in Tensor calculus and this problem reminds of that, except I can't quite recall how derivatives work with respect to components, or what those derivatives produce. Consider the following example:
The transition from (7) to (8) is where I'm lost. We have
$$ \frac{\partial f}{\partial X_{jk}} = \frac{\partial}{\partial X_{jk}} \sum_i \sum_j \sum_k A_{ij}X_{jk}B_{ki} $$
Now, the ultimate goal here is to find $\frac{\partial f}{\partial X}$, so why did we choose to differentiate on the indices $(j,k)$. Clearly that's a convenient choice, but why is it acceptable to choose, what is the intuition or motivation for that choice (this is my main question)? Given that it is the correct choice, I believe the derivative and subsequent work goes somewhat like:
\begin{align} \frac{\partial}{\partial X_{jk}} \sum_i \sum_j \sum_k A_{ij}X_{jk}B_{ki} &= \sum_i \sum_j \sum_k A_{ij}\frac{\partial}{\partial X_{jk}}\left(X_{jk}\right)B_{ki} \\ &= \sum_i \sum_j \sum_k A_{ij}1_{jk}B_{ki} \\ &= \sum_i \sum_j A_{ij} \sum_k 1_{jk}B_{ki} \tag{$\star$} \\ &= \sum_i \sum_j A_{ij}B{ji} \end{align}
But my steps in/around $(\star)$ are not legitimate. I sense that there should be some kronecker delta action around there. In tensor notation I get:
\begin{align} \frac{\partial}{\partial X_{jk}} A_{ij}X_{jk}B_{ki} &= A_{ij}\frac{\partial}{\partial X_{jk}}\left(X_{jk}\right)B_{ki} \\ &= A_{ij}B_{ki} \\ &= \left[BA\right]_{kj} \\ &= \left[BA\right]^T_{jk} \\ &= \left[A^TB^T\right]_{jk} \\ \end{align}
which are the components of $A^TB^T$. So that seems to work fine (am I correct?). If so, how do I reconcile the notations in order to perform the same calculation in index notation?

I can see the confusion in the notation, so let's change things up a little bit to try and make it clearer.
First let's note what our coordinates are: For convenience, all indices range from from $1$ to $n$. We have coordinates of the form $(X_{ij})=(X_{ij}, 1\leq i,j\leq n)$. Then for fixed matrices $A$ and $B$, we have that $X_{ij}(A)=A_{ij}$ and $X_{ij}(B)=B_{ij}$. Then our function $f(X)$, in coordinates is written as $$f((X_{ij}))=\sum_i\sum_j\sum_kA_{ij}X_{jk}B_{ki}.$$ Now let's differentiate with respect to a fixed coordinate $X_{ab}$. Then \begin{align*} \frac{\partial f}{\partial X_{ab}}&=\frac{\partial}{\partial X_{ab}}\sum_i\sum_j\sum_kA_{ij}X_{jk}B_{ki}\\ &=\sum_i\sum_j\sum_kA_{ij}B_{ki}\frac{\partial X_{jk}}{\partial X_{ab}}\\ &=\sum_i\sum_j\sum_kA_{ij}B_{ki}\delta_{aj}\delta_{bk}\\ &=\sum_iA_{ia}B_{bi}\\ &=\sum_iB_{bi}A_{ia}\\ &=X_{ba}(BA)\\ &=X_{ab}((BA)^T)\\ &=X_{ab}(A^TB^T). \end{align*}
Now, $$\frac{\partial f}{\partial X_{ab}}=X_{ab}\left(\frac{\partial f}{\partial X}\right)$$ by definition, and hence $$\frac{\partial f}{\partial X}=A^TB^T.$$