How is the chain rule applied here and result as trace?
I know $tr(XY)$ is a standard inner product on symmetric matrix. Is there any kind of generalization of chain rule that involves inner product that consistent with the example above?
How is the chain rule applied here and result as trace?
I know $tr(XY)$ is a standard inner product on symmetric matrix. Is there any kind of generalization of chain rule that involves inner product that consistent with the example above?
Copyright © 2021 JogjaFile Inc.

Let $\mathrm{Spd}(p)$ define the open set of symmetric positive definite $p \times p$ real matrices. The first thing you may be interested in is the differential of the mapping $$ \log \det : \mathrm{Spd}(p) \to \mathbb{R}. $$ Let $\boldsymbol\Sigma \in \mathrm{Spd}(p)$. If $\mathrm{D}_{\boldsymbol\Sigma} \det$ denotes the differential of $\det$ at $\boldsymbol\Sigma$, you can show that : $$ \mathrm{D}_{\boldsymbol\Sigma} \det \cdot \mathbf{H} = \mathrm{tr}\Big( \mathrm{Com}(\boldsymbol\Sigma)^{\top} \mathbf{H} \Big) $$ where $\mathrm{Com}(\boldsymbol\Sigma)$ denotes the comatrix of $\boldsymbol\Sigma$. Using the chain rule, you get that :
$$ \mathrm{D}_{\boldsymbol\Sigma} \big( \log \circ \det \big) = \mathrm{D}_{\det(\boldsymbol\Sigma)} \log \circ \mathrm{D}_{\boldsymbol\Sigma} \det. $$ Given that $\mathrm{Com}(\boldsymbol\Sigma)^{\top}\boldsymbol\Sigma = \boldsymbol\Sigma \mathrm{Com}(\boldsymbol\Sigma)^{\top} = \det(\boldsymbol\Sigma) \mathrm{I}_{p}$, you get : $$ \mathrm{D}_{\boldsymbol\Sigma} \log \det \cdot \mathbf{H} = \frac{1}{\det(\boldsymbol\Sigma)} \mathrm{tr}\Big( \mathrm{Com}(\boldsymbol\Sigma)^{\top} \mathbf{H} \Big) = \mathrm{tr}\big( \boldsymbol\Sigma^{-1} \mathbf{H} \big).$$
Then, let $\phi : \mathrm{dom}(f) \to \mathrm{Spd}(p)$ be the mapping such that : $$\forall \mathbf{x} \in \mathrm{dom}(f), \, \phi(\mathbf{x}) = \mathbf{F}_{0} + x_{1} \mathbf{F}_{1} + \ldots + x_{n} \mathbf{F}_{n}. $$ Then :
$$ f = \log\det \circ \phi. $$
Using the chain rule again, you get that for $\mathbf{x} \in \mathrm{dom}(f)$ :
$$ \mathrm{D}_{\mathbf{x}} \big( \log\det \circ \phi \big) = \mathrm{D}_{\phi(\mathbf{x})}(\log\det) \circ \mathrm{D}_{\mathbf{x}}\phi. \tag{$\star$} $$
Finally, let $(\mathbf{e}_{1},\ldots,\mathbf{e}_{n})$ denote the canonical basis of $\mathbb{R}^{n}$. Then, by definition :
$$ \forall i \in \lbrace 1,\ldots,n \rbrace, \, \forall \mathbf{x} \in \mathrm{dom}(f), \; \frac{\partial f}{\partial x_{i}}(\mathbf{x}) = \mathrm{D}_{\mathbf{x}} f \cdot \mathbf{e}_{i}. $$
Using $(\star)$, you obtain that for all $i \in \lbrace 1,\ldots,n \rbrace$ and for all $\mathbf{x} \in \mathrm{dom}(f)$ :
This is the formula which is given in your post, with $\mathbf{F} = \phi(\mathbf{x})$.