Imagine I have three $N\times N$ matrices: A, B and C. How could I write the matrix multiplication $X = ABC$ as a sum s.t. $X_{ij} = \sum something ?$.
I tried to write out $AB = D$ first: $D_{ij} = \sum_k A_{ik}B_{kj}$ and then use the associative property s.t. $X_{ij} = [DC]_{ij} = \sum_{l}D_{il}C_{lj} = \sum_{l}(\sum_{k}A_{ik}B_{kl})C_{lj}$
Is this correct? Is there an easier way to write it down?
I ask because I have a question about finding the derivative of the trace of $ABC$ w.r.t $B$ and I need this as my first step.
This is correct. There's not really any easier way of writing it down. There is some intuition, though: $A_{ik}B_{kl}{C_{lj}}$ can be thought of as a "walk" from $i$ to $j$ with three steps $i \rightarrow k \rightarrow l \rightarrow j$. Then $(ABC)_{ij}$ is given by taking the sum over all "walks" from $i$ to $j$, where each walk is "weighted" by the entries $A_{ik}B_{kl}{C_{lj}}$. Similarly, $$(ABCD)_{ij} = \sum_{i_1} \sum_ {i_2}\sum_{i_3} A_{i, i_1} B_{i_1, i_2} C_{i_2, i_3} D_{i_3, j}$$ is given by taking the sum over all walks from $i$ to $j$ with four steps.
If you write it as a single sum, and generalize it to arbitrary products, the notation gets a little trickier. Let $A^1, A^2, \ldots, A^l$ be $n \times n$ matrices; note that the superscripts here are not powers, just indices on the tuple - we need them because we'll reserve subscripts for the row and column indices. Define a walk $\gamma$ from $i$ to $j$, of length $l$, as a tuple $(i, i_1, i_2, i_3, \ldots, i_{l-1}, j)$, with each entry between $1$ and $n$, and let $P_l$ be the set of all walks $\gamma$ of length $l$. Define the weight of $\gamma \in P_l$ to be $$w(\gamma) = A^1_{i,i_1} A^2_{i_1, i_2} A^3_{i_3, i_4} \cdots A^{l}_{i_{l-1}, j}.$$ Then $$(A^1 A^2 \cdots A^l)_{ij} = \sum_{\gamma \in P_l} w(\gamma).$$
This idea is used frequently in graph theory (the incidence matrix) and probability (Markov chains).