There is a scaler-by-matrix derivative identity:
$$\frac{\partial}{\partial X}trace\left(AXBX'C\right)=B'X'A'C'+BX'CA$$
How does this change if instead I am trying to find
$$\frac{\partial}{\partial x}trace\left(Adiag(x)Bdiag(x)'C\right)$$
where $x$ is a vector rather than a matrix.
My thinking is that all I have to do is multiply the original identity by a vector of ones as that would be the derivative of $diag(x)$. However, I'm not sure how the chain rule interacts with traces.
I ask as I am trying to calculate. $$\frac{\partial}{\partial w}trace\left(Ddiag(w)\Omega diag(w)D'\right)$$
where $w \mathbb{\in R^{N}}$, $D\mathbb{\in R^{M\times N}}$, and $\Omega\mathbb{\in R^{N\times N}}$. Also $\Omega$ can be assumed to be positive definite.
This implies the result would be
$$\left(2\Omega diag(w)D'D\right)e$$
where $e \mathbb{\in R^{N}}$ is a vector of ones.
Let $f : \mathbb R^n \to \mathbb R$ be defined by
$$f (\mathrm x) := \mbox{tr} \left( \mathrm A \, \mbox{diag} (\mathrm x) \, \mathrm B \, \mbox{diag} (\mathrm x) \, \mathrm C \right)$$
where $\mathrm A \in \mathbb R^{m \times n}$, $\mathrm B \in \mathbb R^{n \times n}$ and $\mathrm C \in \mathbb R^{n \times m}$ are given. The directional derivative of $f$ in the direction of $\mathrm v \in \mathbb R^n$ at $\mathrm x \in \mathbb R^n$ is given by
$$\begin{array}{rl} \displaystyle\lim_{h \to 0} \dfrac{f (\mathrm x + h \,\mathrm v) - f (\mathrm x)}{h} &= \mbox{tr} \left( \mathrm A \, \mbox{diag} (\mathrm v) \, \mathrm B \, \mbox{diag} (\mathrm x) \, \mathrm C \right) + \mbox{tr} \left( \mathrm A \, \mbox{diag} (\mathrm x) \, \mathrm B \, \mbox{diag} (\mathrm v) \, \mathrm C \right)\\ &= \mbox{tr} \left( \mbox{diag} (\mathrm v) \, \mathrm B \, \mbox{diag} (\mathrm x) \, \mathrm C \, \mathrm A \right) + \mbox{tr} \left( \mbox{diag} (\mathrm v) \, \mathrm C \, \mathrm A \, \mbox{diag} (\mathrm x) \, \mathrm B \right)\\ &= \mathrm v^\top \mbox{diag}^{-1} \left( \mathrm B \, \mbox{diag} (\mathrm x) \, \mathrm C \, \mathrm A \right) + \mathrm v^\top \mbox{diag}^{-1} \left( \mathrm C \, \mathrm A \, \mbox{diag} (\mathrm x) \, \mathrm B \right)\end{array}$$
where $\mbox{diag}^{-1} : \mathbb R^{n \times n} \to \mathbb R^n$ is a linear function that takes a square matrix and extracts its main diagonal as a column vector. Thus, the gradient of $f$ is
$$\nabla_{\mathrm x} f(\mathrm x) = \color{blue}{\mbox{diag}^{-1} \left( \mathrm B \, \mbox{diag} (\mathrm x) \, \mathrm C \, \mathrm A \right) + \mbox{diag}^{-1} \left( \mathrm C \, \mathrm A \, \mbox{diag} (\mathrm x) \, \mathrm B \right)}$$