Hessian of matrix fractional function

363 Views Asked by At

I am interested in the matrix fractional function (from $GL_n(\mathbb{R})$ to $\mathbb{R}$) : $S \mapsto x^\top S^{-1} x$, where $x \in \mathbb{R}^n$ is fixed.

By computing $f(S + E) - f(S)$ for a small $E$ I am able to show that its gradient at $S$ is $-S^{-1} x x^\top S^{-1}$, but I am stuck when it comes to computing its Hessian (which is a fourth order tensor). Any hint ?

EDIT: Pushing the Taylor expansion leads me to: \begin{align}x^\top(S + E)^{-1} x - x^\top S ^{-1} x &= x^\top(Id + S^{-1} E)^{-1} S^{-1} x -x^\top S^{-1} x \\&= x^\top(Id - S^{-1} E + S^{-1} E S^{-1}E) S^{-1} x -x^\top S^{-1} x +o(||E||^2) \\&= -x^\top S^{-1} E S^{-1} x + x^\top S^{-1} ES^{-1} E S^{-1} x \end{align}

and $-x^\top S^{-1} E S^{-1} x = - tr _,x^\top S^{-1} E S^{-1} x = - tr \, S^{-1} x x^\top S^{-1} E$ which gives me the gradient, and the expression of the second order term, but not the Hessian.

1

There are 1 best solutions below

6
On

Use the Chain rule $$f(S) = g\circ \mathcal{I}(S)$$ with $\mathcal{I}(S) =S^{-1}$ and $g(S) =x^\top Sx$ and see that, $$g: S\mapsto x^\top Sx $$ is linear so $Dg(S)(E) =x^\top Ex$.

We know that $$\mathcal{DI}(S)(V) = -S^{-1}VS^{-1}$$

Hence, for $H $ fixed,

$$\mathcal{D}f(S)(H) =\mathcal{D}g \circ \mathcal{DI}(S)(H) =\color{red}{x^\top S^{-1}HS^{-1} E x :=h\circ \mathcal{I}(S)}$$

Where $$h: A\mapsto -x^\top AHA x$$ Observe that $h$ is quadratic map hence

$$\mathcal{D}h(A)(K) = -x^\top KHA x-x^\top AHK x$$

Therefore, $$\mathcal{D}^2f(S)(H)(K) =\mathcal{D}h \circ\mathcal{DI}(S)(K) =-x^\top [\mathcal{DI}(S)(K)]HS x-x^\top SH [\mathcal{DI}(S)(K)] x \\ = \mathcal{D}^2f(S)(H)(K) = x^\top SH S^{-1}KS^{-1} x+ x^\top S^{-1}KS^{-1}HS x . $$

i.e

$$\color{red}{\mathcal{D}^2f(S)(H)(K) = x^\top SH S^{-1}KS^{-1} x+ x^\top S^{-1}KS^{-1}HS x }$$