Let's assume I have a matrix $B \in R^{m \times r}$, a matrix $P \in R^{r \times r}$ and a diagonal matrix $A \in R^{r \times r}$ defined like this :
$$A_{ii}=a_i \forall i=1,...,r$$
I am stuck in computing the derivative of $trace(BAP(BA)^T)$ w.r.t. to each $a_i$. I would say that it is $$sum(B(P+P^T)*B,1)$$ where $*$ means "element-wise product" and $sum(.,1)$ is the sum of each column. However, I am not sure this is correct. Actually, I let $X=BA$, used this rule to derive $trace(XPX^T)$ (it gives $X*(P+P^T)$ if I am right and then applied chain rule. Could someone help me to retrieve the justification of it ?
From your previous question we know the gradient of the function with respect to the matrix $A$. $$G = \frac{\partial\psi}{\partial A} = B^TBA(P+P^T)$$ So we can expand the differential and perform a change of variables to obtain the gradient wrt the vector $a$. $$\eqalign{ d\psi &= G:dA = G:{\rm Diag}(da) = {\rm diag}(G):da \cr g &= \frac{\partial\psi}{\partial a} = {\rm diag}(G) \cr }$$