derivative of logarithm of determinant

5.5k Views Asked by At

I am trying to read "pattern recognition and machine learning" and in the appendix there is a forumla with no proof. The author suggest to solve the following formula using the given four formulas given below. $\lambda_i$ and $u_i$ are eigen values and eigen vectors respectively. Any hint as to how to tackle this proof. Most proof on the internet use Jacobi's formula. The book is available online: http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf

Trying to prove C.22.

$$\frac { \partial } { \partial x } \ln | \mathbf { A } | = \operatorname { Tr } \left( \mathbf { A } ^ { - 1 } \frac { \partial \mathbf { A } } { \partial x } \right)$$

  1. $\mathbf { u } _ { i } ^ { \mathrm { T } } \mathbf { u } _ { j } = I _ { i j }$
  2. $ \mathbf { A } = \sum _ { i = 1 } ^ { M } \lambda _ { i } \mathbf { u } _ { i } \mathbf { u } _ { i } ^ { \mathrm { T } } $
  3. $ \mathbf { A } ^ { - 1 } = \sum _ { i = 1 } ^ { M } \frac { 1 } { \lambda _ { i } } \mathbf { u } _ { i } \mathbf { u } _ { i } ^ { \mathrm { T } } $
  4. $ | \mathbf { A } | = \prod _ { i = 1 } ^ { M } \lambda _ { i } $
1

There are 1 best solutions below

6
On BEST ANSWER

First of all, this hint is valid only in the case where $\mathbf A$ is a symmetric matrix, for all choices of $x$. Fortunately, since $\mathbf A$ is usually taken to be a covariance matrix of some sort in Bishop's book, this assumption is valid in the situations that Bishop cares about. The proof on Wikipedia makes the weaker assumption that $\mathbf A$ is invertible for all choices of $x$, so it is more general.

The fact that $\mathbf A$ is symmetric for all $x$ means that the $\mathbf u_i$'s satisfy the orthogonality relation$$\mathbf u_i^T \mathbf u_j = \delta_{ij}$$ If we differentiate this with respect to $x$, we find $$ \frac{\partial \mathbf u_i^T}{\partial x} \mathbf u_j+\mathbf u_i^T \frac{\partial \mathbf u_j}{\partial x} = 0.$$

To proceed, let's write the determinant in terms of the eigenvalues, $$ \| \mathbf A\| = \prod_i \lambda_i, $$

so the derivative of the log of the determinant is

$$ \frac{\partial}{\partial x} \log \| \mathbf A \| = \sum_i \lambda_i^{-1}\frac{\partial \lambda_i }{\partial x}.$$

Now let's tackle the right-hand side of the identity. We have $$ \mathbf A^{-1} = \sum_i \lambda_i^{-1}\mathbf u_i \mathbf u_i^T$$ $$ \frac{\partial\mathbf A}{\partial x} = \sum_j \frac{\partial \lambda_j}{\partial x} \mathbf u_j \mathbf u_j^T + \sum_{j}\lambda_j \left( \frac{\partial \mathbf u_j}{\partial x}\mathbf u^T_j + \mathbf u_j \frac{\partial \mathbf u^T_j}{\partial x} \right)$$

To help us evaluate traces, observe that $$ {\rm Tr} \left[ \mathbf u_i \mathbf u_i^T \mathbf u_j \mathbf u_j^T\right] = (\mathbf u_i^T \mathbf u_j)(\mathbf u_j^T \mathbf u_i) = \delta_{ij}.$$ \begin{align} {\rm Tr} \left[ \mathbf u_i \mathbf u_i^T \left( \frac{\partial \mathbf u_j}{\partial x} \mathbf u_j^T + \mathbf u_j\frac{\partial \mathbf u_j^T}{\partial x} \right)\right] &= \left(\mathbf u_i^T \frac{\partial \mathbf u_j}{\partial x}\right)(\mathbf u_j^T \mathbf u_i) + (\mathbf u_i^T \mathbf u_j)\left(\frac{\partial \mathbf u_j^T}{\partial x} \mathbf u_i\right) \\ &= \delta_{ij} \left( \frac{\partial \mathbf u_i^T}{\partial x} \mathbf u_j+\mathbf u_i^T \frac{\partial \mathbf u_j}{\partial x}\right) \\ &= 0\end{align}

Hence $$ {\rm Tr} \left[ \mathbf A^{-1} \frac{\partial \mathbf A}{\partial x}\right] = \sum_i \lambda^{-1}_i \frac{\partial \lambda_i}{\partial x},$$ which agrees with our earlier expression for $\partial \log \| \mathbf A \| / \partial x$.