Given standard matrix inner product, \begin{equation} \begin{aligned} f(\textbf{X}) := & \;\;\;\; {\langle}{\textbf{X}, \textbf{A}\textbf{X}}{\rangle}\\ =& \; \text{tr} (\textbf{X}^{\text{T}} \textbf{A}\textbf{X}),\\ & \textbf{X} \in \mathcal{R}^{n \times r}, \text{ the variable,}\\ & \textbf{A} \in \mathcal{R}^{n \times n}, \text{ a constant matrix and isn't necessarily be symmetric}. \end{aligned} \end{equation}
I want to calculate the Hessian with respect to ${\textbf{X}}$ which is a matrix not a vector. I know how to compute the gradient which is,
\begin{equation} \begin{aligned} & \nabla_\textbf{X} f(\textbf{X})= \begin{cases} {\textbf{2AX}, \; \; \; \; \; \; \; \; \; \text{ if } \textbf{A} = \textbf{A}^{\text{T}}},\\ {\textbf{(A+A}^{\text{T}}) \textbf{X}, \text{ else. }} \end{cases}\\ & \in \mathcal{R}^{n \times r} \end{aligned} \end{equation}
but it is highly unclear how I do for a matrix variable. And in general, it is hard to find a material available and clear for this. Of course, the definition is in Wikipedia but for a matrix, I need a small example like $\textbf{X} \in \mathcal{R}^{3 \times 2}, \; \textbf{A} \in \mathcal{R}^{3 \times 3},$ then it will become clear.
In this case of the small example, the dimension of the Hessian Matrix will become $\textbf{X} \in \mathcal{R}^{6 \times 6}$ as far as believe.
And hopefully, there will exist $\textbf{the neat mathematical expression to denote the resulting hessian matrix}$ for this function as it does for the gradient.
$\textbf{With a clear example please}$, thanks in advance.
It will definitely help many people because this is fundamental but not well accessible.
Since you know how to calculate the gradient, let's start by taking the differential of that $$\eqalign{ S &= A+A^T \cr G &= \nabla f = SX \cr dG &= S\,dX \cr }$$ There are two ways to proceed: vectorize the equation or use tensors.
Vectorization flattens the $(dG,dX)$ matrices into vectors and the Hessian into a matrix. $$\eqalign{ dg &= (I\otimes S)\,dx \cr H = \frac{\partial g}{\partial x} &= I\otimes S \cr }$$ where $\otimes$ represents the Kronecker product and $\,\,dx={\rm vec}(dX)$.
But the true Hessian is a fourth-order tensor. $$\eqalign{ dG &= S{\mathcal E}:dX \cr {\mathcal H} = \frac{\partial G}{\partial X} &= S{\mathcal E} \cr }$$ where ${\mathcal E}$ is a tensor constant whose components can be written in terms of Kronecker deltas $$\eqalign{ {\mathcal E}_{ijkl} &= \delta_{ik}\delta_{jl} \cr }$$ The colon represents the double-contraction product $$B={\mathcal E}:X \implies B_{ij}=\sum_k\sum_l {\mathcal E}_{ijkl}\,X_{kl}$$ while juxtaposition represents the single-contraction product.
The components of the Hessian are equal to $$ {\mathcal H}_{ijkl} = \frac{\partial G_{ij}}{\partial X_{kl}} = \sum_nS_{in}{\mathcal E}_{njkl} = S_{ik}\delta_{jl} $$