Is there any way (i.e. a formula) to compute the derivative of the summation in the second step below without having to separate the summation by cases according to whether particular indices are equal to $i$ or $j$.
$$ \frac{\partial}{\partial X_{ij}} trace\{X^TCXN\} = \frac{\partial}{\partial X_{ij}} \sum_k\sum_l\sum_m\sum_n X_{lk}C_{lm}X_{mn}N_{nk}$$ $$= \sum_m\sum_n C_{im}X_{mn}N_{nj} + \sum_k\sum_l X_{lk}C_{li}N_{jk}$$ $$=(CXN)_{ij} + (C^TXN^T)_{ij}$$
For example how I analysed it was by separating the sum over $k$ into a sum over $k\neq j$ and a sum with $j=k$ and then continued separating summations like this until I could evaluate the derivative.
This worked and I got the correct answer, however it is very time consuming.
Is there a formula I can use when I have sums such as these that can help me compute the derivative quickly without having to separate into so many individual summations? Particularly what makes it difficult is that there are at least two occurrences of the variable $X$ with given indices and the indices summed over for the different occurrences are not the same indices.
The reason I ask is this was presented as a single step in a solution and I'm curious if there is another way besides the way I did it which was very time consuming.
EDIT: For example in computing the derivative of $Tr\{X^TXX^TX\}$ with respect to $X$, which I've been working through, I have eight summations. It would be great if there was a better way.
Getting a handle on how to use the total derivative can knock these problems over in a flash.
Let $f \colon M_n(\mathbb{R}) \to M_n(\mathbb{R})$ be the function $f(X) = X^T C X N$. Then its derivative at $X \in M_n(\mathbb{R})$ in the direction $D \in M_n(\mathbb{R})$ can be found as the coefficient of $t$ in $$ \begin{aligned} f(X + tD) &= (X + tM)^T C (X + tD) N \\ &= f(X) + t(D^T C X N + X^T C D N) + t^2(...), \end{aligned} $$ hence $df_X(D) = D^T C X N + X^T C D N$. The derivative of the trace function is $d \operatorname{tr}_X(D) = \operatorname{tr}(D)$, so by the chain rule we get that $$ d(\operatorname{tr} \circ f)_X = \operatorname{tr}(D^T C X N) + \operatorname{tr}(X^T C D N).$$ To take the partial derivative with respect to $X_{ij}$, we really mean to take the total derivative at the point $X$, evaluated on the coordinate matrix $E_{ij}$: $$ \begin{aligned} d(\operatorname{tr} \circ f)_X: E_{ij} &\mapsto \operatorname{tr}(E_{ji} C X N) + \operatorname{tr}(X^T C E_{ij} N) \\ &= (C X N)_{ij} + (N X^T C)_{ij}. \end{aligned}$$