Hessian wrt. MATRIX-VARIABLE for a Quadratic Inner Product.

711 Views Asked by At

Given standard matrix inner product, \begin{equation} \begin{aligned} f(\textbf{X}) := & \;\;\;\; {\langle}{\textbf{X}, \textbf{A}\textbf{X}}{\rangle}\\ =& \; \text{tr} (\textbf{X}^{\text{T}} \textbf{A}\textbf{X}),\\ & \textbf{X} \in \mathcal{R}^{n \times r}, \text{ the variable,}\\ & \textbf{A} \in \mathcal{R}^{n \times n}, \text{ a constant matrix and isn't necessarily be symmetric}. \end{aligned} \end{equation}

I want to calculate the Hessian with respect to ${\textbf{X}}$ which is a matrix not a vector. I know how to compute the gradient which is,

\begin{equation} \begin{aligned} & \nabla_\textbf{X} f(\textbf{X})= \begin{cases} {\textbf{2AX}, \; \; \; \; \; \; \; \; \; \text{ if } \textbf{A} = \textbf{A}^{\text{T}}},\\ {\textbf{(A+A}^{\text{T}}) \textbf{X}, \text{ else. }} \end{cases}\\ & \in \mathcal{R}^{n \times r} \end{aligned} \end{equation}

but it is highly unclear how I do for a matrix variable. And in general, it is hard to find a material available and clear for this. Of course, the definition is in Wikipedia but for a matrix, I need a small example like $\textbf{X} \in \mathcal{R}^{3 \times 2}, \; \textbf{A} \in \mathcal{R}^{3 \times 3},$ then it will become clear.

In this case of the small example, the dimension of the Hessian Matrix will become $\textbf{X} \in \mathcal{R}^{6 \times 6}$ as far as believe.

And hopefully, there will exist $\textbf{the neat mathematical expression to denote the resulting hessian matrix}$ for this function as it does for the gradient.

$\textbf{With a clear example please}$, thanks in advance.

It will definitely help many people because this is fundamental but not well accessible.

1

There are 1 best solutions below

4
On BEST ANSWER

Since you know how to calculate the gradient, let's start by taking the differential of that $$\eqalign{ S &= A+A^T \cr G &= \nabla f = SX \cr dG &= S\,dX \cr }$$ There are two ways to proceed: vectorize the equation or use tensors.

Vectorization flattens the $(dG,dX)$ matrices into vectors and the Hessian into a matrix. $$\eqalign{ dg &= (I\otimes S)\,dx \cr H = \frac{\partial g}{\partial x} &= I\otimes S \cr }$$ where $\otimes$ represents the Kronecker product and $\,\,dx={\rm vec}(dX)$.

But the true Hessian is a fourth-order tensor. $$\eqalign{ dG &= S{\mathcal E}:dX \cr {\mathcal H} = \frac{\partial G}{\partial X} &= S{\mathcal E} \cr }$$ where ${\mathcal E}$ is a tensor constant whose components can be written in terms of Kronecker deltas $$\eqalign{ {\mathcal E}_{ijkl} &= \delta_{ik}\delta_{jl} \cr }$$ The colon represents the double-contraction product $$B={\mathcal E}:X \implies B_{ij}=\sum_k\sum_l {\mathcal E}_{ijkl}\,X_{kl}$$ while juxtaposition represents the single-contraction product.

The components of the Hessian are equal to $$ {\mathcal H}_{ijkl} = \frac{\partial G_{ij}}{\partial X_{kl}} = \sum_nS_{in}{\mathcal E}_{njkl} = S_{ik}\delta_{jl} $$