Prove $\frac{\partial ln|X|}{\partial X_{ij}}=tr[X^{-1} \frac{\partial X}{\partial X_{ij}}]$ not using adjoint.

292 Views Asked by At

As part of proving this: Prove $\frac{\partial \rm{ln}|X|}{\partial X} = 2X^{-1} - \rm{diag}(X^{-1})$.

Prove

$$\frac{\partial \ln \lvert X \rvert}{\partial X_{ij}}=\text{tr} \left[X^{-1} \frac{\partial X}{\partial X_{ij}} \right]$$ where $\forall \ p \ \in \ \mathbb{N}, \ X \ \in \ \mathbb{R}^{p \times p}, \ $ X is a positive definite matrix.

In words: derivative of logarithm of determinant of a matrix is trace of (inverse times derivative of matrix) and then each derivative is with respect to each entry of the matrix

Condition: Do not use adjoint. For adjoint: Prove $\frac{\partial ln|X|}{\partial X_{ij}}=tr[X^{-1} \frac{\partial X}{\partial X_{ij}}]$ using adjoint.

Note: just in case any of this notation seems wrong or or something, see matrix cookbook: p.15's (141), p.9's (57) and p.8's (43)

2

There are 2 best solutions below

0
On BEST ANSWER

What I tried based on Th2.14f in Rencher and Schaalje

Follow the proof (or directly apply) Theorem 2.14f by replacing $A = X$ and $x = x_{ij}$. $$ \frac{\partial ln|X|}{\partial X_{ij}} = \frac{\partial ln|CDC'|}{\partial X_{ij}} $$

Since positive definite matrices are symmetric, $X$ has a spectral decomposition: $X = CDC'$ where $C$ is an orthogonal matrix and $D$ is a diagonal matrix whose elements are the eigenvalues of X on the main diagonal and zero otherwise.

\begin{align} \frac{\partial ln|CC'D|}{\partial X_{ij}} =&\frac{\partial ln|D|}{\partial X_{ij}}\\ =&\frac{\partial ln(\prod_{i=1}^{n} \lambda_i)}{\partial X_{ij}}\\ =&\frac{\partial ln(\lambda_1\lambda_2...\lambda_n)}{\partial X_{ij}}\\ =&\frac{\partial (ln\lambda_1 + ln\lambda_2 + ... + ln\lambda_n)}{\partial X_{ij}}\\ =&\frac{\partial (\sum_{i=1}^{n} ln\lambda_i)}{\partial X_{ij}}\\ =&\sum_{i=1}^{n} \frac{1}{\lambda_i} \cdot \frac{ ( \partial \lambda_i)}{\partial X_{ij}}\\ =&tr(D^{-1} \frac{\partial D}{\partial X_{ij}})\\ \end{align} where $D^{-1}$ is a diagonal matrix whose elements are the reciprocals of the corresponding elements in D on the main diagonal and zero otherwise.

$$ tr(D^{-1}) = \sum_{i=1}^{n} \frac{1}{\lambda_i} = \sum_{i=1}^{n} D^{-1}_{ii}. $$

$\frac{\partial D}{\partial X_{ij}}$ is a diagonal matrix whose elements are the partial derivatives of the corresponding elements in $D$ with respect to $X_{ij}$.

$$ tr(\frac{\partial D}{\partial X_{ij}}) = \sum_{i=1}^{n} \frac{ ( \partial \lambda_i)}{\partial X_{ij}} = \sum_{i=1}^{n} \frac{\partial D}{\partial X_{ij}}_{ii}. $$

and $\sum_{i=1}^{n} D^{-1}_{ii} \frac{\partial D}{\partial X_{ij}}_{ii} = tr(D^{-1} \frac{\partial D}{\partial X_{ij}}$) since the diagonal matrix $D^{-1} \frac{\partial D}{\partial X_{ij}}$ is the not only the matrix product but also the entrywise product of $D^{-1}$ and $\frac{\partial D}{\partial X_{ij}}$ since they are diagonal matrices.

It remains to prove that $tr(D^{-1} \frac{\partial D}{\partial X_{ij}}) = tr(X^{-1} \frac{\partial X}{\partial X_{ij}}).$ \begin{align} tr(D^{-1} \frac{\partial D}{\partial X_{ij}}) &= tr(D^{-1} \frac{\partial D}{\partial X_{ij}} + \frac{\partial CC'}{\partial X_{ij}}) \\ &= tr(D^{-1} \frac{\partial D}{\partial X_{ij}} + C\frac{\partial C'}{\partial X_{ij}} + C'\frac{\partial C}{\partial X_{ij}}) \\ &=tr(CD^{-1} \frac{\partial D}{\partial X_{ij}}C' + C\frac{\partial C'}{\partial X_{ij}} + CD^{-1}C'\frac{\partial C}{\partial X_{ij}}DC') \\ &=tr((CD^{-1}C')(C \frac{\partial D}{\partial X_{ij}}C' + CD\frac{\partial C'}{\partial X_{ij}} + CD^{-1}C'\frac{\partial C}{\partial X_{ij}}DC')) \\ &=tr(CD^{-1}C' [\frac{C\partial (DC')}{\partial X_{ij}} + \frac{\partial C}{\partial X_{ij}}(DC')])\\ &=tr(CD^{-1}C' \frac{\partial (CDC')}{\partial X_{ij}})\\ &=tr(X^{-1} \frac{\partial X}{\partial X_{ij}})\\ &\hspace{10cm}{\it QED}\\ \end{align}

Rencher, Alvin, and G. Bruce Schaalje. "Matrix Algebra." Linear Models in Statistics. Second ed. New Jersey: John Wiley and Sons, Inc., 2008. 5-68. Print.

2
On

Denote the eigenvalues of $X$ by $\{\lambda_k\}$. Then the eigenvalues of $F(X)$ are given by $\{F(\lambda_k)\}$.

Recall that the defining property of the exponential function is $$\prod_k \exp(\lambda_k) = \exp\Bigg(\sum_k \lambda_k\Bigg)$$ or in terms of $X$ $$\det\Big(\exp(X)\Big)=\exp\Big({\rm tr}(X)\Big)$$ Setting $X=\log(Y)$ and taking the log of both sides yields $$\log\Big(\det(Y)\Big)={\rm tr}\Big(\log(Y)\Big)$$ Next, consider the differential of the trace of a simple power function $$\eqalign{ d\,{\rm tr}(X^3) &= {\rm tr}(X^2(dX)+X(dX)X+(dX)X^2)\cr &= {\rm tr}(\,3\,X^2\,dX\,) \cr }$$ where the collection of like powers was made possible by the cyclic property of the trace.

Extending this to a generic function $f(X)$ yields $$d\,{\rm tr}\big(\,f(X)\,\big) = {\rm tr}\big(\,f'(X)\,\,dX\,\big)$$ Combining this with the previous result yields $$\eqalign{ d\,\log\big(\det(X)\big) &= d\,{\rm tr}\big(\log(X)\big) \cr &= {\rm tr}\big(X^{-1}\,dX\big) \cr }$$