What is the Hessian w.r.t. to matrix X of this quadratic function?

379 Views Asked by At

I am stuck in finding the Hessian w.r.t. to matrix $X \in R^{m \times n}$ in the following :

$$\frac{1}{2} ||AXB-C||_F^2$$ where $A \in R^{l \times m}$ and $B \in R^{n \times o}$

I got the first order derivative (gradient):

$$A^TAXBB^T-A^TCB^T$$

but I am stuck with the Hessian. I guess it should be something like $A^TABB^T$ but there is a problem in the dimensions as both $A^TA$ and $BB^T$ are square matrix but of size respectively $m$ and $n$ so I don't see how to multiply them.

I was wondering if it wouldn't imply tensor product but I am not so sure

Thanks for explanation

1

There are 1 best solutions below

2
On

To continue with your calculation, let's find the differential of the gradient. $$\eqalign{ G &= A^TAXBB^T - A^TCB^T \cr dG &= A^TA\,dX\,BB^T \cr }$$ From here, there are two ways to proceed to the hessian: tensors or vectorization.

Let's introduce the isotropic fourth-order tensor ${\mathcal E}$ whose components can be written as the product of two Kronecker deltas. $${\mathcal E}_{ijkl} = {\delta}_{ik}\,{\delta}_{jl}$$ Let's also introduce the single-contraction (denoted by juxtaposition) and double-contraction (denoted by a colon) products between tensors. $$\eqalign{ {\mathcal C} = {\mathcal A}{\mathcal B} &\implies {\mathcal C}_{ijklmn} = \sum_{p} {\mathcal A}_{ijkp}{\mathcal B}_{plmn} \cr {\mathcal C} = {\mathcal A}:{\mathcal B} &\implies {\mathcal C}_{ijmn} = \sum_{k,l} {\mathcal A}_{ijkl}{\mathcal B}_{klmn} \cr }$$ Use these notations to find the tensor hessian (${\mathcal H}$). $$\eqalign{ dG &= A^TA\,dX\,BB^T = A^TA\,{\mathcal E}\,BB^T:dX \cr {\mathcal H} = \frac{\partial G}{\partial X} &= A^TA\,{\mathcal E}\,BB^T \cr }$$ Or, we can use vectorization to calculate a matrix hessian ($H$). $$\eqalign{ {\rm vec}(dG) &= {\rm vec}(A^TA\,dX\,BB^T) = \Big(BB^T\otimes A^TA\Big)\,{\rm vec}(dX) \cr H = \frac{\partial\,{\rm vec}(G)}{\partial\,{\rm vec}(X)} &= BB^T\otimes A^TA \cr }$$ The two results are very similar, simply interchange $({\mathcal E}\leftrightarrow\otimes)$ and reverse the order of the factors.