Are Hessian matrices always symmetric?

12.5k Views Asked by At

The objective function of interest is: $$ \phi = \text{log}|PWP^T| + \text{tr}((PWP^T)^{-1}PVP^T) $$ where $P = J + XU^T$ and $V$, $J$ and $U^T$ are known matrices. I assume that the $V$ and $W$ are positive definite.

The first partial derivative of $\phi$ with respect to $X$ is \begin{align*} Y^{-1}(JWU + JVU) + Y^{-1}X(U^TWU + U^TVU) - Y^{-1}ZY^{-1}(JWU + XU^TWU) \end{align*} where $Y = PWP^{T}$ and $Z = PVP^T$.

The first partial derivative with respect to $W$ gives $$ PWP^T = PVP^T. $$

Hence combining the solutions of partial derivatives $X = -JVU(U^TVU)^{-1}$.

If I compute $\nabla_{xx}\phi$ and evaluate at the solution what I would get is $$ [(U^TWUX^T + U^TWJ^T)Y^{-1} \otimes Y^{-1}](I + K)[(U^TWUX^T + U^TWJ^T)^T \otimes I] + (U^TVU \otimes Y^{-1}) $$

where $K$ is the commutation matrix. However, I can't see that it is symmetric. If this is not symmetric, Hessian would not be symmetric. Is it always the case that the Hessian needs to be symmetric?

2

There are 2 best solutions below

0
On

No, it is not true.

You need that $\frac {\partial^2 f}{\partial x_i\partial x_j} = \frac {\partial^2 f}{\partial x_j\partial x_i}$ in order for the hessian to be symmetric.

This is in general only true, if the second partial derivatives are continuous. This is called Schwarz's theorem.

2
On

Of course, if you use the gradient and not the derivative, it's difficult to see that your hessian is symmetric (again the effects of the Matrix cookbook!!). Here there is no problem because the functions are $C^{\infty}$.

For the sake of simplicity, I do the calculation on $f(X)=\log|PWP^T|=2\log|P|+\log|W|$. The derivative is the linear function

$Df_X:H\in M_n\mapsto 2tr(P'(H)P^{-1})=2tr(HU^TP^{-1})$. The hessian is the bilinear function

$D^2f_X:(H,K)\in (M_n)^2\mapsto -2tr(HU^TP^{-1}P'(K)P^{-1})=$

$-2tr(HU^TP^{-1}KU^TP^{-1})$. It's easy to see that the result satisfies

$D^2f_X(H,K)=D^2f_X(K,H)$.

EDIT. I forgot to say that the hessian is symmetric in any point (not only in the critical points of the function).