Differentials to derivatives involving trace of matrices

101 Views Asked by At

Suppose $P$ is a real-valued function of the $p\times m$ (real) matrix $\mathbf{Q}$. After taking its differential, one arrives with the following:

$$ d(P(\mathbf{Q})) = \operatorname{trace}\left\{\mathbf{1}^\top_p\left[ d\mathbf{Q}\odot \mathbf{W} \right]\left(\mathbf{w}\odot \mathbf{G}\right)\right\} $$ where $\mathbf{1}_p$ is the $p\times 1$ vector of $1$'s with $\mathbf{W}$ is $p\times m$ while $\mathbf{w}$ and $\mathbf{G}$ are both $m \times 1$. $\mathbf{W}$, $\mathbf{w}$ and $\mathbf{G}$ are matrices involving $\mathbf{Q}$.

Question: What is $ \dfrac{dP}{d\mathbf{Q}} $ ?

Attempt: $ \dfrac{dP}{d\mathbf{Q}} = \mathbf{W} \left(\mathbf{w}\odot \mathbf{G}\right) \mathbf{1}_p $

But I think it's wrong. So my problem really is that Hadamard product of $d\mathbf{Q}$ and $\mathbf{W} $.

Some identities I have found online are these:

$\dfrac{d(\mathbf{a}^\top\mathbf{X}\mathbf{b})} {d \mathbf{X}} = \mathbf{a} \mathbf{b}^\top$

$\operatorname{trace} (\mathbf{A}\odot \mathbf{B})\mathbf{C} = \operatorname{trace} \mathbf{A} (\mathbf{B}^\top \odot \mathbf{C})$

UPDATE: To make it simpler, a general problem would be

$$ \frac{\mathbf{a}^\top\left[d\mathbf{Q}\odot f(\mathbf{Q}) \right]g(\mathbf{Q})}{d\mathbf{Q}} $$ where $\mathbf{a}\in \mathbb{R}^{p}$, $f:\mathbb{R}^{p\times m}\rightarrow \mathbb{R}^{p\times m}$ and $g:\mathbb{R}^{p\times m}\rightarrow \mathbb{R}^{m}$.

The available identity I have encountered similar to this is

$$ \frac{\operatorname{trace}(\mathbf{A}d\mathbf{X})}{d\mathbf{X}} = \mathbf{A} $$

from page 2 of this link.

1

There are 1 best solutions below

5
On

For convenience, let $$\eqalign{ b &= w\odot G, \quad a &= 1_p \cr }$$ Rearrange the given differential to isolate the gradient wrt $Q$. $$\eqalign{ dP &= {\rm Tr}\Big(a^T\,(dQ\odot W)\,b\Big) \cr &= a^T\,(dQ\odot W)\,b \quad {\rm \{trace\,does\,not\,affect\,scalar\,values\}} \cr &= ab^T:(dQ\odot W) \cr &= (ab^T\odot W):dQ \cr \frac{\partial P}{\partial Q} &= ab^T\odot W \cr &= \Big(1_p(w\odot G)^T\Big)\odot W \cr }$$ where a colon is used to write the trace in product form, i.e. $$A:B = {\rm Tr}\big(A^TB\big)$$

UPDATE
The updated question uses $(f,g)$ in place of $(W,b)\,$ so the gradient becomes $$\eqalign{ \frac{\partial P}{\partial Q} &= ag^T\odot f \cr }$$