I need some help in verifying if my derivation in matrix differentiation is correct.
$\alpha_{n\times n \times m}$ is a tensor, $F(w)_{n \times m} , Z_{n \times d}$ matrix independent of $\alpha$ , $\beta_{n \times m }$ matrix where each entry $\beta_{ij} = \sum_{k=1}^{n} \alpha_{ikj}$
$$ \begin{equation}\label{eqn:alphaGradFw11Vec} \begin{split} \frac{\partial}{\partial \alpha}Tr\left[ F(w)\left(\beta Z^{\top} Z\right)^{\top} \right] &= \left(\frac{\partial}{\partial \alpha} \left(F(w)\left(\beta Z^{\top}Z\right)^{\top}\right)\right)^{\top} \\ &= \left(F(w) Z^{\top} Z \left(\frac{\partial}{\partial \alpha}\beta^{\top}\right) \right)^{\top} \\ &= \left(F(w) Z^{\top} Z \left(\frac{\partial}{\partial \alpha}\beta^{\top}\right) \right)^{\top} \\ &= \left(\frac{\partial}{\partial \alpha}\beta^{\top}\right)^{\top} \left(F(w) Z^{\top} Z \right)^{\top} \\ &= \left(\frac{\partial}{\partial \alpha}\beta^{\top}\right)^{\top} \left(F(w) Z^{\top} Z \right)^{\top} \\ &= \left(\frac{\partial}{\partial \alpha}\beta\right) Z^{\top} Z\, F(w)^{\top} \\ \end{split} \end{equation} $$
For above I have used the result: $$\frac{\partial \left(Tr(g(\mathbf{X}))\right)}{\partial \mathbf{X}} = \left(g^{'}(\mathbf{X})\right)^{\top}$$
Please let me know if this derivation is correct and also the dimensionality of the result.
Thanks in advance
$\def\o{{\large\tt1}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$For typing convenience, define $\o\in{\mathbb R}^{n}$ as the all-ones vector and the matrices $$\eqalign{ M &= FZ^TZ \quad&\implies\quad M_{ij} = F_{ip}Z_{qp}Z_{qj} \\ \beta& \quad&\implies\quad \beta_{ij} = \o_k\,\alpha_{ikj} \\ }$$ NB: $\,$ For the equations to be dimensionally compatible, $Z$ must be a $(d\times m)$ matrix not $(n\times d)$ as stated in the question.
Write the trace function using index notation $$\eqalign{ T &= {\rm Tr}(FZ^TZ\beta^T) \\ &= M_{ij}\,\beta_{ij} \\ &= M_{ij}\,\o_k\,\alpha_{ikj} \\ }$$ Then calculate its differential/gradient. $$\eqalign{ dT &= M_{ij}\o_k\,d\alpha_{ikj} \\ \p{T}{\alpha_{ikj}} &= M_{ij}\o_k \;=\; F_{ip}Z_{qp}Z_{qj}\o_k \\ }$$ The gradient is obviously a third-order tensor, and therefore cannot be expressed using standard matrix notation, but it's rather straightforward using index notation.
By defining a tensor with components $\,\gamma_{ikj}=M_{ij}\o_k,\,$ you can write an index-free equation $$\p{T}{\alpha} = \gamma\\$$
It may be of interest to calculate the gradient of $\beta$ with respect to $\alpha$ $$\eqalign{ \beta_{ij} &= \o_k\,\alpha_{ikj} \\ &= \o_k\delta_{ip}\delta_{jq}\,\alpha_{pkq} \\ d\beta_{ij} &= \o_k\delta_{ip}\delta_{jq}\;d\alpha_{pkq} \\ \p{\beta_{ij}}{\alpha_{pkq}} &= \o_k\delta_{ip}\delta_{jq} \;\doteq\; \Gamma_{ijpkq} \\ }$$ where $\delta_{ip}$ is a Kronecker ${\rm symbol}.\;$ Note that this gradient $\left(\Gamma=\p{\beta}{\alpha}\right)\,$ is a $5^{th}$ order tensor!
But you can actually use $\Gamma$ to calculate the original derivative using double and triple dot products instead of index notation $$\eqalign{ T &= M:\beta \\ dT &= M:d\beta \\ &= M:\big(\Gamma\therefore d\alpha\big) \\ \p{T}{\alpha} &= M:\Gamma \\ }$$