Let $A, B, X \in \mathbb{R}^{n \times n}$ and assume that $X^{-1}$ exists. Derive $\frac{\partial K}{\partial X}$ where $K(X)= \text{Tr}[B X^T A X^{-1}]$
I have tried the following so far ($U = B X^T A X^{-1}, K = \text{Tr}[U]$):
$$ \frac{\partial K}{\partial X} = \frac{\partial K}{\partial U} \frac{\partial U}{\partial X} \\ = \frac{\partial \text{Tr}[U]}{\partial U} \frac{\partial U}{\partial X} \\ =I_n \frac{\partial U}{\partial X} = \frac{\partial U}{\partial X} \\ B X^T \frac{\partial A X^{-1}}{\partial X} + A X^{-1} \frac{\partial B X^T}{\partial X} $$ But now I am lacking the tools to compute the matrix-by-matrix derivatives.
Let's use a colon to denote the trace/Frobenius product, i.e. $$A:B={\rm Tr}(A^TB)$$ The cyclic property allows terms in a trace product to be rearranged in lots of ways, e.g. $$\eqalign{ A:B &= A^T:B^T &= B:A \\ A:BC &= B^TA:C &= AC^T:B \\ }$$ Write the function using the trace product, then calculate its differential and gradient. $$\eqalign{ K &= AX^{-1}B:X \;= A^TXB^T:X^{-1} \\ dK &= AX^{-1}B:dX + A^TXB^T:dX^{-1} \\ &= AX^{-1}B:dX - A^TXB^T:X^{-1}dX\,X^{-1} \\ &= AX^{-1}B:dX - X^{-T}A^TXB^TX^{-T}:dX \\ &= \Big(AX^{-1}B - X^{-T}A^TXB^TX^{-T}\Big):dX \\ \frac{\partial K}{\partial X} &= AX^{-1}B \;-\; X^{-T}A^TXB^TX^{-T} \\ \\ }$$
In the above, the differential of $X^{-1}$ was utilized; here's how it was derived. $$\eqalign{ I &= X^{-1}X \\ 0 &= dX^{-1}X + X^{-1}dX \\ 0 &= dX^{-1} + X^{-1}dX\,X^{-1} \\ dX^{-1} &= -X^{-1}dX\,X^{-1} \\ }$$ As you have discovered, the problem with the chain rule in matrix calculus is that it very often requires the calculation intermediate quantities which are higher-order tensors, e.g. matrix/matrix, matrix/vector, and vector/matrix derivatives.
The differential approach is simpler because the differential of a matrix is just another matrix and obeys the rules of matrix algebra.