Simplified expression for Frobenius norm of hessian of matrix function

252 Views Asked by At

Let $X \in \mathbb R^{n \times d}$ and $y \in \mathbb R^{n \times 1}$ and $u \in \mathbb R^{k \times 1}$. Define the function $F:\mathbb R^{d \times k} \to \mathbb R$ by $F(W) := y^\top \psi(XW)u$ for some twice-differentiable function $\psi:\mathbb R \to \mathbb R$, and $\psi(A)$ is the matrix with $ij$th entry $\psi(a_{i,j})$.

Question. What is a simplified expression for the Frobenius norm of the hessian $\|\nabla^2 F(W)\|_{Fro}^2$ ?

1

There are 1 best solutions below

3
On BEST ANSWER

$\def\p#1#2{\frac{\partial #1}{\partial #2}}\def\R{{\mathbb R}}\def\v{{\rm vec}}\def\d{{\rm diag}}\def\D{{\rm Diag}}$Let a colon denote the trace/Frobenius product, i.e. $$\eqalign{ A:B &= {\rm Tr}(A^TB) = B:A \\ A:A &= \big\|A\big\|_F^2\\ }$$ Define the matrices $$\eqalign{ Z &= XW \qquad&\implies\quad dZ = X\,dW \\ P &= \psi(Z) \qquad&\implies\quad dP = Q\odot dZ \\ Q &= \psi'(Z) \qquad&\implies\quad dQ = R\odot dZ \\ R &= \psi''(Z) \\ }$$ where $(\odot)$ denotes the elementwise/Hadamard product and $\,(\psi',\psi'')\,$ denote the ordinary first and second derivatives of the $\psi$ function.

Write the cost function in terms of the above, then calculate its gradient. $$\eqalign{ F &= yu^T:P \\ dF &= yu^T:dP &({\rm differential}) \\ &= yu^T:Q\odot X\,dW \\ &= Q\odot yu^T:X\,dW \\ &= X^T\left(Q\odot yu^T\right):dW \\ \p{F}{W} &= X^T\left(yu^T\odot Q\right) \;\doteq\; G\qquad&({\rm gradient}) \\ }$$ Next, calculate the differential of $G$. $$\eqalign{ dG &= X^T\left(yu^T\odot dQ\right) \\ &= X^T\left(R\odot yu^T\odot X\,dW\right) \\ }$$ The Hessian is going to be a fourth-order tensor, so we're stuck.

One way to proceed is to note that the component-wise self derivative of $W$ is $$\eqalign{ \p{W}{W_{ij}} &= E_{ij} \;\in\,\R^{d\times k} \\ }$$ where $E_{ij}$ is the matrix with the $(i,j)$ component equal to one and all others equal to zero.

Then the components of the Hessian ${\cal H}$ can be calculated as $$\eqalign{ {\cal H}_{ijk\ell} &= \p{G_{ij}}{W_{k\ell}} \\ &= E_{ij}:\Big(X^T\left(R\odot yu^T\odot XE_{k\ell}\right)\Big) \\ &= x_i^T\big(r_j\odot y\odot x_k\big)\;u_j\delta_{j\ell} \\ }$$ where the vectors $(x_k,r_k)$ are the $k^{th}$ columns of the $(X,R)$ matrices.

Finally, the Frobenius norm can be calculated for this tensor $$\eqalign{ \big\|{\cal H}\big\|^2_F &= \sum_{h=1}^d\sum_{i=1}^k\sum_{j=1}^d\sum_{\ell=1}^k\;{\cal H}_{hij\ell}^2 \\ &= \sum_{h=1}^d\sum_{i=1}^k\sum_{j=1}^d\;\Big(x_h^T\big(r_i\odot y\odot x_j\big)\;u_i\Big)^2 \\\\ }$$ UPDATE

Since you're only interested in the norm, the $(G,W)$ matrices can be flattened into vectors $(g,w)$. This in turn flattens the Hessian tensor into a matrix (i.e. ${\cal H}\to H$) and allows all calculations to be carried out in matrix notation without explicit summations over the components.

For typing convenience, define the variables $$\eqalign{ I &\in\R^{k\times k}&&\big({\rm Identity\,matrix}\big) \\ A &= I\otimes X\quad\quad&\implies\quad&C = AA^T = I\otimes XX^T \\ B &= \D(b)\quad&\implies\quad&b = \v(R\odot yu^T) \\ g &= \v(G)\quad&\implies\quad&w = \v(W) \\ }$$ Then $$\eqalign{ dG &= X^T(R\odot yu^T\odot X\,dW) \\ dg &= (I\otimes X)^T\;\big(b\odot\v(X\,dW)\big) \\ &= (I\otimes X)^TB\;\v(X\,dW) \\ &= (I\otimes X)^TB\,(I\otimes X)\;dw \\ \p{g}{w} &= A^TBA \;\doteq\; H \qquad\big({\rm Hessian\,matrix}\big) \\ }$$ Now compute the matrix norm as usual. $$\eqalign{ \big\|H\big\|_F^2 &= A^TBA :A^TBA \\ &= B:CBC \\ &= b:\d(CBC) \\ &= b:(C\odot C)b \\ &= b^T\left(C\odot C\right)b \\ }$$