optimizing a logdet function with respect to a scalar and the Hessian matrix

235 Views Asked by At

Given a logdet function $\mathcal{L}(\gamma)$, $$ \mathcal{L}(\gamma) = \log\vert \mathbf{I} + \gamma\mathbf{S} \vert - \mathbf{q}^T(\gamma^{-1}\mathbf{I} + \mathbf{S})^{-1} \mathbf{q}, $$ where $\mathbf{S}$ is a symmetric positive semi-definite matrix, $\mathbf{q}$ is a column vector, and $\mathbf{I}$ is the identity matrix.

Can I solve $\frac{\partial \mathcal{L}(\gamma)}{\partial \gamma}=0$ in a closed form ? and how could I derive the Hessian $H = \frac{\partial^2 \mathcal{L}(\gamma)}{\partial \gamma^2}$ ?

1

There are 1 best solutions below

9
On BEST ANSWER

Using the multivariate extension of the chain rule (some useful identities are on Wikipedia under "Matrix Calculus"), we can compute $\frac{\partial L(\gamma)}{\partial \gamma}$:

$$ \frac{\partial L(\gamma)}{\partial \gamma} = \frac{\partial}{\partial \gamma} \log|I + \gamma S| - \frac{\partial }{\partial \gamma}\left[q^T(\gamma^{-1}I+S)^{-1}q\right]\\ =tr((I+\gamma S)^{-1}S) -tr(qq^T\frac{\partial }{\partial \gamma}(\gamma^{-1}I+S)^{-1})\\ =tr(S^{-1/2}(S^{-1}+\gamma I)^{-1}S^{-1/2}S) + tr(qq^T(\gamma^{-1}I+S)^{-1}(-\gamma^{-2}I)(\gamma^{-1}I+S)^{-1}) \\ =tr((S^{-1}+\gamma I)^{-1})-\gamma^{-2}q^T(\gamma^{-1}I+S)^{-2}q\\ = \sum_i(\lambda_i^{-1}+\gamma)^{-1} - \gamma^{-2}q^T(\gamma^{-1}I+S)^{-2}q \\ = \sum_i(\lambda_i^{-1}+\gamma)^{-1} - q^T(I+\gamma S)^{-2}q $$

where $\lambda_i$ are the eigenvalues of $S$. Now, let the diagonalization of $S$ be given by $S = Q^{-1}\Sigma Q$. Then:

$$ = \sum_i(\lambda_i^{-1}+\gamma)^{-1} - q^T(I+\gamma Q^{-1}\Sigma Q)^{-2}q \\ = \sum_i(\lambda_i^{-1}+\gamma)^{-1} - tr(qq^T(I+\gamma Q^{-1}\Sigma Q)^{-2}) \\ = \sum_i(\lambda_i^{-1}+\gamma)^{-1} - tr(qq^TQ^{-2}(I+\gamma \Sigma )^{-2}Q^2) \\ = \sum_i(\lambda_i^{-1}+\gamma)^{-1} - tr(Q^2qq^TQ^{-2}(I+\gamma \Sigma )^{-2}) \\ = \sum_i(\lambda_i^{-1}+\gamma)^{-1} - \sum_i \dfrac{Z_{ii}}{(1+\gamma\lambda_i)^2} $$ where $Z_{ii}$ are the diagonal entries of $Q^2qq^TQ^{-2}$. Set to zero and put in a more readable form, we have

$$ \sum_i \dfrac{Z_{ii}}{(1+\gamma\lambda_i)^2} = \sum_i \dfrac{\lambda_i}{1+\gamma\lambda_i}\\ \sum_i \dfrac{\lambda_i(1+\gamma\lambda_i) - Z_{ii}}{(1+\gamma\lambda_i)^2} = 0 $$

EDIT: Assuming the above calculations are correct, we can compute the Hessian.

$$ \frac{\partial^2 L(\gamma)}{\partial \gamma^2} = \frac{\partial}{\partial \gamma} \left[\sum_i(\lambda_i^{-1}+\gamma)^{-1} - q^T(I+\gamma S)^{-2}q \right]\\ =\sum_i-(\lambda_i^{-1}+\gamma)^{-2} - \frac{\partial}{\partial \gamma} q^T(I+\gamma S)^{-2}q\\ =\sum_i-(\lambda_i^{-1}+\gamma)^{-2} - tr(\frac{\partial}{\partial U(\gamma)} q^TU(\gamma)q \frac{\partial}{\partial \gamma}(I+\gamma S)^{-2})\\ =\sum_i-(\lambda_i^{-1}+\gamma)^{-2} - tr(qq^T \frac{\partial}{\partial \gamma}(I+\gamma S)^{-2}) \\ =\sum_i-(\lambda_i^{-1}+\gamma)^{-2} - tr(qq^T (I+\gamma S)^{-1}((I+\gamma S)^{-1}S + S(I+\gamma S)^{-1})(I+\gamma S)^{-1}) $$

The final steps comes from careful application of chain and product rules.

If we don't want the eigenvalue part, we can get $$ =-tr((S^{-1}+\gamma I)^{-2}) - tr(qq^T (I+\gamma S)^{-1}((I+\gamma S)^{-1}S + S(I+\gamma S)^{-1})(I+\gamma S)^{-1}) $$