Chain rule, partial derivative and inner product?

643 Views Asked by At

enter image description here

How is the chain rule applied here and result as trace?

I know $tr(XY)$ is a standard inner product on symmetric matrix. Is there any kind of generalization of chain rule that involves inner product that consistent with the example above?

1

There are 1 best solutions below

2
On BEST ANSWER

Let $\mathrm{Spd}(p)$ define the open set of symmetric positive definite $p \times p$ real matrices. The first thing you may be interested in is the differential of the mapping $$ \log \det : \mathrm{Spd}(p) \to \mathbb{R}. $$ Let $\boldsymbol\Sigma \in \mathrm{Spd}(p)$. If $\mathrm{D}_{\boldsymbol\Sigma} \det$ denotes the differential of $\det$ at $\boldsymbol\Sigma$, you can show that : $$ \mathrm{D}_{\boldsymbol\Sigma} \det \cdot \mathbf{H} = \mathrm{tr}\Big( \mathrm{Com}(\boldsymbol\Sigma)^{\top} \mathbf{H} \Big) $$ where $\mathrm{Com}(\boldsymbol\Sigma)$ denotes the comatrix of $\boldsymbol\Sigma$. Using the chain rule, you get that :

$$ \mathrm{D}_{\boldsymbol\Sigma} \big( \log \circ \det \big) = \mathrm{D}_{\det(\boldsymbol\Sigma)} \log \circ \mathrm{D}_{\boldsymbol\Sigma} \det. $$ Given that $\mathrm{Com}(\boldsymbol\Sigma)^{\top}\boldsymbol\Sigma = \boldsymbol\Sigma \mathrm{Com}(\boldsymbol\Sigma)^{\top} = \det(\boldsymbol\Sigma) \mathrm{I}_{p}$, you get : $$ \mathrm{D}_{\boldsymbol\Sigma} \log \det \cdot \mathbf{H} = \frac{1}{\det(\boldsymbol\Sigma)} \mathrm{tr}\Big( \mathrm{Com}(\boldsymbol\Sigma)^{\top} \mathbf{H} \Big) = \mathrm{tr}\big( \boldsymbol\Sigma^{-1} \mathbf{H} \big).$$

Then, let $\phi : \mathrm{dom}(f) \to \mathrm{Spd}(p)$ be the mapping such that : $$\forall \mathbf{x} \in \mathrm{dom}(f), \, \phi(\mathbf{x}) = \mathbf{F}_{0} + x_{1} \mathbf{F}_{1} + \ldots + x_{n} \mathbf{F}_{n}. $$ Then :

$$ f = \log\det \circ \phi. $$

Using the chain rule again, you get that for $\mathbf{x} \in \mathrm{dom}(f)$ :

$$ \mathrm{D}_{\mathbf{x}} \big( \log\det \circ \phi \big) = \mathrm{D}_{\phi(\mathbf{x})}(\log\det) \circ \mathrm{D}_{\mathbf{x}}\phi. \tag{$\star$} $$

Finally, let $(\mathbf{e}_{1},\ldots,\mathbf{e}_{n})$ denote the canonical basis of $\mathbb{R}^{n}$. Then, by definition :

$$ \forall i \in \lbrace 1,\ldots,n \rbrace, \, \forall \mathbf{x} \in \mathrm{dom}(f), \; \frac{\partial f}{\partial x_{i}}(\mathbf{x}) = \mathrm{D}_{\mathbf{x}} f \cdot \mathbf{e}_{i}. $$

Using $(\star)$, you obtain that for all $i \in \lbrace 1,\ldots,n \rbrace$ and for all $\mathbf{x} \in \mathrm{dom}(f)$ :

$$ \frac{\partial f}{\partial x_{i}}(\mathbf{x}) = \mathrm{D}_{\phi(\mathbf{x})}(\log\det) \cdot \mathbf{F}_{i} = \mathrm{tr}\big( \phi(\mathbf{x})^{-1} \mathbf{F}_{i} \big). $$

This is the formula which is given in your post, with $\mathbf{F} = \phi(\mathbf{x})$.