Hessian of a function of matrix

317 Views Asked by At

I'm asking again some help with Matrix calculus.

I am interested in computing the Hessian of a function of matrices, namely: $$f(\Sigma) = \log\left(1 + y^\top\Sigma^{-1}y\right)$$ with $y$ a constant vector and $\Sigma$ a symmetric positive definite matrix.

I can compute the gradient as $$ G = \frac{\partial f}{\partial \Sigma} = \frac{\Sigma^{-1}yy^\top\Sigma^{-1}}{1+y^\top\Sigma^{-1}y}$$ but then I am stuck with the Hessian.

I have seen this question, but the proposed function is simpler and in that case I am able to find an answer (I will post there when I get some time). In the meantime, following this answer I managed to find $$ dG = \frac{-\Sigma^{-1} d\Sigma \Sigma^{-1} (yy^\top) \Sigma^{-1} - \Sigma^{-1} (yy^\top) \Sigma^{-1} d\Sigma \Sigma^{-1}}{1+y^\top\Sigma^{-1}y} + X $$ but $X$ is still a mistery for me.

By applying the product rule, $X$ should be the derivative of the denominator times the numerator and divided by the denominator squared. But what is supposed to come out of this? I would expect a symmetric tensor (of course), but following my first instinct I get $$ X = \frac{\Sigma^{-1}yy^\top\Sigma^{-1} \Sigma^{-1}yy^\top\Sigma^{-1}}{\left(1+y^\top\Sigma^{-1}y\right)^2}\Sigma^{-1}d\Sigma\Sigma^{-1} $$ and this term is a correct second order tensor (the part with $\Sigma^{-1}d\Sigma\Sigma^{-1}$) but has a matrix coefficient, it appears. What to do with it? I firstly thought of contracting it with the first index of the first $\Sigma^{-1}$, but in doing this I am losing the symmetry of the Hessian.

I am not sure how to work in index notation, too, but I supposed that maybe the two parts that compose the numerator should contract with themselves, but I am throwing in the dark here, as I don't get why this should happen. I must confess that I am a bit lost in the notation here...

I tried this handy tool, but of course it does not work for fourth order tensors.

Thank you everyone!

1

There are 1 best solutions below

2
On BEST ANSWER

Note that $$\eqalign{ e^f &= 1+y^T\Sigma^{-1}y \\ &= 1+yy^T:\Sigma^{-1} \\ &= 1+Y:\Sigma^{-1} \\ }$$ The differential is $$\eqalign{ de^f &= -Y:\Sigma^{-1}d\Sigma\,\Sigma^{-1} \\ e^fdf &= -\Sigma^{-1}Y\Sigma^{-1}:d\Sigma \\ G &=\frac{\partial f}{\partial\Sigma} = -e^{-f}\Sigma^{-1}Y\Sigma^{-1} \\ \\ dG &= -de^{-f}\Sigma^{-1}Y\Sigma^{-1} -e^{-f}d\Sigma^{-1}Y\Sigma^{-1} -e^{-f}\Sigma^{-1}Yd\Sigma^{-1} \\ &= e^{-f}(df)\Sigma^{-1}Y\Sigma^{-1} +e^{-f}\Sigma^{-1}d\Sigma\,\Sigma^{-1}Y\Sigma^{-1} +e^{-f}\Sigma^{-1}Y\Sigma^{-1}d\Sigma\,\Sigma^{-1} \\ &= -G(G:d\Sigma) - \Sigma^{-1}d\Sigma\,G - G\,d\Sigma\,\Sigma^{-1} \\ }$$ Vectorize the matrix terms $$\eqalign{ s&=\operatorname{vec}(\Sigma),\quad \Sigma=\operatorname{Mat}(s) \\ g&=\operatorname{vec}(G),\quad G=\operatorname{Mat}(g) \\ }$$ and write the equation as $$\eqalign{ dg &= -gg^Tds -(G\otimes\Sigma^{-1})ds -(\Sigma^{-1}\otimes G)ds \\ H=\frac{\partial g}{\partial s} &= -gg^T - (G\otimes\Sigma^{-1}) - (\Sigma^{-1}\otimes G) \\ dg &= H\,ds,\qquad dG = \operatorname{Mat}(H\,ds) \\ }$$ where $\otimes$ denotes the Kronecker product, and a colon represents the trace/Frobenius product
$$\eqalign{ A:B = \operatorname{Tr}(A^TB) }$$ The Hessian has been calculated as the matrix $H$.
If required, the components of the fourth-order Hessian tensor ${\cal H}$ can be calculated as follows. $$\eqalign{ G_{ij} &= e_ie_j^T:G \\&= {\rm vec}(e_ie_j^T):g \\&= (e_j\otimes e_i):g \\ \\ \Sigma_{kl} &= (e_l\otimes e_k):s \\ \\ {\cal H}_{ijkl} &= \frac{\partial G_{ij}}{\partial \Sigma_{kl}} \\ &= (e_j\otimes e_i)(e_l\otimes e_k)^T:\left(\frac{\partial g}{\partial s}\right)\\ &= (e_j\otimes e_i)(e_l\otimes e_k)^T:H \\ }$$ where the $\{e_k\}$ are the standard basis vectors.