I'm trying to derive a matrix expression, but I get a weird issue with dimensionality.
Let's say that we have the matrix $A$ which is $3\times3$, and $B$ which is $3\times10$. The matrix $AB$, then, is also $3\times10$, and it's derivative with relation to A is -
$\frac{\partial}{\partial A}AB = B^T$, which is $10\times3$.
On the other hand, if we have some scalar function $f()$ which is a scalar function that acts on matrices member-wise, the matrix $f(AB)$ is of the same dimension as AB. However, when trying to derive it by $A$, and applying the chain rule, I get:
$\frac{\partial}{\partial A}f(AB) = f'(AB)\cdot B^T$, which yields a $3\times 3$ matrix, so it seems as if we got a matrix of different dimensions after deriving a similar matrix by the same thing.
What am I missing? Maybe there is something wrong with the ponit-wise derivative of the scalar function $f$ that makes the chain rule inapplicable in this case? I'm not sure.
Edit:
I now realized that what I really wanted to ask is how to derive the trace of the above expression. With the thorough detailed explanation given in the answer, I realize that it should be:
$\frac{\partial}{\partial A}Tr(f(AB)C) = \frac{\partial}{\partial F}Tr(FC)\frac{\partial}{\partial A}f(AB) = C^T \cdot G:\mathcal{M}\cdot B^T$,
where $F:= f(AB), G:= f'(AB)$.
Is there a way to avoid using larger dimension tensors in this case? The answer is the and should be $3\times 3$ as we are deriving a scalar.
Thanks a lot
$\def\s{{\rm size}}\def\E{{\cal E}}\def\M{{\cal M}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$Because the matrix-by-matrix gradient $\left(\p{\,(A\cdot B)}{A} \right)$ is not a matrix but a fourth-order tensor, in order to proceed we need to introduce some tensors.
In particular, we need the sixth-order tensor ${\M}$, whose components ${\M}_{ijk\ell mn}$ are unity if $\,(i=k=m)$ and $(j=\ell=n),\,$ but zero otherwise.
This tensor makes it possible to replace a Hadamard product with a pair of Frobenius products $$\eqalign{ A\odot Z &= A:\M:Z = Z\odot A \\ }$$ We'll also need the fourth-order tensor $\E$ defined such that $$\eqalign{ &\E_{ijk\ell} = \delta_{ik}\delta_{j\ell} &\big({\rm Kronecker\,deltas}\big) \\ &\E:A = A:\E = A &\big(\E\,{\rm is\,identity\,for\,}:\!\big) \\ &C\cdot A\cdot B = \left(C\cdot\E\cdot B^T\right):A\quad &\big({\rm Reordering\,property}\big) \\ }$$ Finally, define the matrices which result from applying the scalar function $\phi(x)$ and its derivative $\phi'(x)$ element-wise to a matrix argument $X=A\cdot B$ $$\eqalign{ F &= \phi(X) \qquad\quad G &= \phi'(X) }$$ Now calculate the differential and gradient of $F$ $$\eqalign{ dF &= G\odot dX \\ &= G:\M:(dA\cdot B) \\ &= G:\M:(\E\cdot B^T:dA) \\ &= (G:\M:\E\cdot B^T):dA \\ &= (G:\M\cdot B^T):dA \\ \p{F}{A} &= G:\M\cdot B^T \\ }$$ In the case that $\phi(x)=x$ $$\eqalign{ \phi'(x)&=1 \quad\implies\quad F=X=A\cdot B,\quad&G=J={\rm all\,ones\,matrix} \\ \p{F}{A} &= J:\M:(\E\cdot B^T) \\ &= J\odot(\E\cdot B^T) \qquad&\big(J\,{\rm is\,identity\,for\,}\odot\!\big) \\ &= \E\cdot B^T \\\\ }$$
In the above, single-dot and double-dot (aka Frobenius) products are used, which are defined as $$\eqalign{ A\cdot B &= \sum_{j=1}^n A_{ij} B_{jk} \\ A:Z &= \sum_{i=1}^m \sum_{j=1}^n A_{ij} Z_{ij} \;=\; {\rm Tr}(AZ^T) \\ }$$ These products have a straightforward extension to higher-order tensors, i.e. the rightmost indices of the term on the left are summed against the leftmost indices of the term on the right.
It might be helpful to jot down the dimensions of the various quantities $$\eqalign{ {\tt3,3} &= \s(A) \\ {\tt3,10} &= \s(B) =\s(F) =\s(G) =\s(X) \\ {\tt3,10,3,10} &= \s(\E) \\ {\tt3,10,3,10,3,10} &= \s(\M) \\ {\tt3,10,3,3} &= \s\!\left(\p{F}{A}\right) \\ }$$