Simplifying a Hessian

140 Views Asked by At

Let $b: \mathbb R^m \to \mathbb R$ be a differentiable function, $A \in \mathbb R^{d \times m}\;$ and $x \in \mathbb R^d$. Consider the function $f(A) = b(A^T x)$. Let us denote the argument of $b$ as $\theta$. We would like to compute the Hessian of $f$. My calculation suggests that $$ \frac{\partial^2 f}{\partial A_{uv} \,\partial A_{st}} = x_s \,x_u \,\frac{\partial^2 b}{\partial \theta_v \,\partial \theta_t} $$ where the last term is evaluated at $\theta = A^T x$. Suppose we would like to write this as a Hessian matrix w.r.t. $\text{vec}(A) \in \mathbb R^{md}\;$ which is obtained by vertically concatenating the columns of $A$. It seems to me that this Hessian can then be written as $$ \Big(\frac{\partial^2 b}{\partial \theta_v \,\partial \theta_t}\Big)_{u,v = 1}^m \otimes xx^T = \nabla^2 b \,\otimes \, xx^T. $$ where $\otimes$ is the Kronecker product and $\nabla^2 b = (\frac{\partial^2 b}{\partial \theta_v \,\partial \theta_t})$ is the Hessian matrix of $b$. I did convince myself that this is true, but checking it is an indexing mess and I might have fooled myself into believing it.

  1. Is this correct?
  2. Is there a clean way of deriving this?
2

There are 2 best solutions below

0
On

$\def\c#1{\color{red}{#1}}\def\v{{\rm vec}}\def\p{{\partial}}\def\grad#1#2{\frac{\p #1}{\p #2}}\def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3^T}}\def\E{{\cal E}}\def\F{{\cal H}}$Given a function $b(\theta)$ and its gradient and hessian with respect to the vector $\theta$ $$\eqalign{ p=\grad{b}{\theta}\doteq\nabla b \quad\qquad Q=\hess{b}{\theta}{\theta} \doteq \nabla^2b \\ }$$ use the relationship $\,\theta=A^Tx\;$ to obtain the gradient and hessian wrt the matrix $A$.

Start by expanding the differential of the function. $$\eqalign{ db &= p:\c{d\theta} \\ &= p:\c{dA^Tx} \\ &= xp^T:dA \\ G \doteq \grad{b}{A} &= xp^T \qquad&\big({\rm gradient\,matrix}\big) \\ }$$ Now expand the differential of the gradient to calculate the hessian. $$\eqalign{ dG &= x\,\c{dp}^T \\ &= x\left(\c{Q\,d\theta}\right)^T \\ &= x\,d\theta^TQ^T \\ &= xx^TdA\,Q^T \\ &= \left(xx^T\cdot\E\cdot Q\right):dA \\ \F \doteq \grad{G}{A} &= xx^T\cdot\E\cdot Q \quad&\big({\rm hessian\,tensor}\big) \\ }$$ where $(\!\ \cdot | :\ \!)\,$ denote single|double dot products between fourth-order tensors and matrices $$\eqalign{ &\E\cdot Q = \sum_{\ell=1}^m\;\E_{ijk\ell}\,Q_{\ell s} &\qquad xx^T\cdot\E = \sum_{i=1}^d\;\big(xx^T\big)_{ri}\,\E_{ijk\ell} \\ &\F = xx^T\cdot\E\cdot Q &\qquad \F:A = \sum_{k=1}^d \sum_{\ell=1}^m\;\F_{ijk\ell} A_{k\ell} \\ }$$ and $\E$ is the fourth-order identity tensor $$\eqalign{ \E &= \grad{A}{A} \qquad\implies\quad \E_{ijk\ell} &= \grad{A_{ij}}{A_{k\ell}} = \delta_{ik}\delta_{j\ell} \\ A &= \E:A = A:\E \\\\ }$$


An alternative to tensors is to flatten the new variable into $\,a=\v(A)$. $$\eqalign{ g &\doteq \left(\grad{b}{a}\right) \;=\; \v\left(\grad{b}{A}\right) \\ &= \v(xp^T) \\ &= p\otimes x \\ \\ dg &= \v(dG) \\ &= \v(xx^TdA\,Q^T) \\ &= (Q\otimes xx^T)\;\v(dA) \\ &= (Q\otimes xx^T)\;da \\ H \doteq \grad{g}{a} &= Q\otimes xx^T \\ }$$

0
On

$\def\tsum{\textstyle\sum}$ For completeness, I want to include the version using index notation.

Denote by $f$ the function $f(A)=A^Tx$. In indices, that is $$f_i = \tsum_j x_jA_{ji}$$ Denote by $\partial_{st}$ the derivative with respect to $A$, so that $\partial_{st}A_{ji}=\delta_{sj}\delta_{ti}$. Then $$\partial_{st}f_i = \tsum_j x_j\delta_{sj}\delta_{ti} = x_s\delta_{ti}$$ and $\partial_{uv}\partial_{st}f_i=0$. Now, by the chain rule and the Leibniz rule you have $$\begin{aligned} \partial_{st}(b\circ f) &= \tsum_k(\partial_kb\circ f)\cdot(\partial_{st}f_k) \\ \partial_{uv}\partial_{st}(b\circ f) &= \tsum_k\partial_{uv}(\partial_kb\circ f)\cdot(\partial_{st}f_k) + (\partial_kb\circ f)\cdot(\partial_{uv}\partial_{st}f_k) \\ &= \tsum_{k,l}(\partial_l\partial_kb\circ f)\cdot(\partial_{uv}f_l)\cdot(\partial_{st}f_k) \\ &= \tsum_{k,l}(Hb\circ f)_{lk}\cdot(x_u\delta_{vl})\cdot(x_s\delta_{tk}) \\ &= x_ux_s(Hb\circ f)_{vt}, \end{aligned}$$ as desired.