Second Derivative with respect to a Matrix

3.2k Views Asked by At

I have a question regarding (second order) derivative with respect to a matrix. I encounter this question because I am calculating Fisher Information, but I guess the context is not very relevant in this question.

Here is the derivative:

$$ \frac{\partial}{\partial \Sigma} \Sigma^{-1}A\Sigma^{-1} $$

where $\Sigma$ is a covariance matrix (positive semi-definite, symmetric), and $A = (x_i - \mu_0)(x_i -\mu_o)^T$, but we may simply use $A$ instead while knowing $A$ is symmetric.

Before posting this question, I have searched on google, and found several sources useful and relevant, but do not answer my question straightaway:

  1. https://www.ics.uci.edu/~welling/teaching/KernelsICS273B/MatrixCookBook.pdf
  2. Second order derivative of the inverse matrix operator

Consequently, I have made a coarse attempt to derive it, but I am not confident whether it is correct.

====================================================================

Consider a very small $\delta\Sigma$

\begin{align*} (\Sigma + \delta\Sigma)^{-1}A(\Sigma+\delta\Sigma)^{-1} &= [\Sigma(I+\Sigma^{-1}(\delta\Sigma))]^{-1}A[(I+(\delta\Sigma)\Sigma^{-1})\Sigma]^{-1}\\ &=(I+\Sigma^{-1}(\delta\Sigma))^{-1}\Sigma^{-1}A\Sigma^{-1}(I+(\delta\Sigma)\Sigma^{-1})^{-1}\\ &=(\sum_{n=0}^\infty(-1)^n[\Sigma^{-1}(\delta\Sigma)]^n)\Sigma^{-1}A\Sigma^{-1}(\sum_{n=0}^\infty(-1)^n[(\delta\Sigma)\Sigma^{-1}]^n)\\ &\approx (I-\Sigma^{-1}(\delta\Sigma))\Sigma^{-1}A\Sigma^{-1}(I-(\delta\Sigma)\Sigma^{-1})\\ &=\Sigma^{-1}A\Sigma^{-1} - \Sigma^{-1}(\delta\Sigma)\Sigma^{-1}A\Sigma^{-1}-\Sigma^{-1}A\Sigma^{-1}(\delta\Sigma)\Sigma^{-1}\\ +\Sigma^{-1}(\delta\Sigma)\Sigma^{-1}A\Sigma^{-1}(\delta\Sigma)\Sigma^{-1} \end{align*}

Then, we may have

\begin{align*} (\frac{\partial}{\partial \Sigma} \Sigma^{-1}A\Sigma^{-1})\delta\Sigma &= \lim_{||\delta\Sigma||\rightarrow0}(\Sigma + \delta\Sigma)^{-1}A(\Sigma+\delta\Sigma)^{-1} - \Sigma^{-1}A\Sigma^{-1}\\ &\approx \lim_{||\delta\Sigma||\rightarrow0}- \Sigma^{-1}(\delta\Sigma)\Sigma^{-1}A\Sigma^{-1}-\Sigma^{-1}A\Sigma^{-1}(\delta\Sigma)\Sigma^{-1} \end{align*}

(somehow by magic or by speculating, I guess) $$ \frac{\partial}{\partial \Sigma} \Sigma^{-1}A\Sigma^{-1} = - \Sigma^{-2}A\Sigma^{-1}-\Sigma^{-1}A\Sigma^{-2} $$

====================================================================

I have a feeling that I may be around there, but not quite yet. I am really hoping to get from this question the output of the derivative.

Thank you so much for all of your time!

p.s.:

  1. You do not have to follow my trail of thoughts (which could be wrong per se), and you may just show the correct way of doing this.

  2. I call this second order derivative because $\Sigma^{-1}A\Sigma^{-1}$ is what I have obtained by taking first derivative of $(x_i-\mu_0)^T\Sigma^{-1}(x_i-\mu_0)$, and yes, all you smart people may have realized this is multivariate normal.

====================================================================

In a month after I posted this question, I managed to find a great reference I would like to share. For those people who are having similar questions, here is a book that will give you a great insight (which closely resembles the method presented by @greg).

"Matrix Differential Calculus with applications in statistics" by Magnus and Neudecker.

Take a look at its Chapter 2, which offers great explanations (and examples) about kronecker product and vector operation, two important concepts when dealing with matrix differential.

1

There are 1 best solutions below

2
On BEST ANSWER

For ease of typing let's use the notations $$\eqalign{ X &= \Sigma \cr A:X &= {\rm \,tr\,}(A^TX) \,\,\,\,\,\,\text{\{trace/Frobenius product\}} \cr }$$

Now we can write the original scalar function and find its differential and gradient $$\eqalign{ \phi &= A:X^{-1} \cr d\phi &= A:dX^{-1} = -A:X^{-1}\,dX\,X^{-1} = -X^{-1}AX^{-1}:dX \cr G=\frac{\partial\phi}{\partial X} &= -X^{-1}AX^{-1} \cr }$$ To proceed to the Hessian, let's introduce the 4th order tensor ${\mathcal H}$ with components $$\eqalign{ {\mathcal H}_{ijkl} = \delta_{ik}\,\delta_{jl} \cr }$$ Now we can calculate the differential and gradient of $G$ as $$\eqalign{ dG &= -dX^{-1}\,AX^{-1} -X^{-1}A\,dX^{-1} \cr &= X^{-1}\,dX\,X^{-1}AX^{-1} + X^{-1}AX^{-1}\,dX\,X^{-1} \cr &= -(X^{-1}\,dX\,G + G\,dX\,X^{-1}) \cr &= -(X^{-1}{\mathcal H}G + G{\mathcal H}X^{-1}):dX \cr \frac{\partial^2\phi}{\partial X^2} = \frac{\partial G}{\partial X} &= -(X^{-1}{\mathcal H}G + G{\mathcal H}X^{-1}) \cr\cr }$$ If you are not comfortable with higher-order tensors, you can use vectorization instead $$\eqalign{ {\rm vec}(dG) &= -{\rm vec}(X^{-1}\,dX\,G + G\,dX\,X^{-1}) \cr dg &= -(G\otimes X^{-1} + X^{-1}\otimes G)\,dx \cr \frac{\partial g}{\partial x} &= -(G\otimes X^{-1} + X^{-1}\otimes G) \cr\cr }$$ NB: In some of these steps, I made use of the fact that $(X,A,G)$ are symmetric matrices.