Chain rule and product rule for computing Hessian from second differential

897 Views Asked by At

I want to compute compute the Hessian of $f(\mathbf{X}) : \mathbb{R}^{m \times n} \rightarrow \mathbb{R}$. I've computed the first differential, which has the form: \begin{align*} df = t(\mathbf{X}) \ vec(\mathbf{X})^T \mathbf{A} \ dvec(\mathbf{X}) \end{align*} where $t(\mathbf{X})$ is a scalar function of $\mathbf{X}$ and I can compute $dt = vec(\mathbf{Q})^T dvec(\mathbf{X})$. But I'm not sure how to put everything together into a quadratic form (where $\mathbf{H}$ is the Hessian): \begin{align*} d^2f = dvec(\mathbf{X})^T \ \mathbf{H} \ dvec(\mathbf{X}). \end{align*} How do I represent $\mathbf{H}$ in terms of $t(\mathbf{X}), \mathbf{A},$ and $\mathbf{Q}$?

Similarly, suppose I have a first differential of the form: \begin{gather*} dg = u(\mathbf{X})\ vec(\mathbf{A})^T dvec(\mathbf{X}), \end{gather*} where I have $du = vec(\mathbf{R})^T dvec(\mathbf{X})$. Again, I want to write out the second differential in terms of the Hessian: \begin{gather*} d^2g = dvec(\mathbf{X})^T \ \mathbf{H} \ dvec(\mathbf{X}). \end{gather*}

If $t(\mathbf{X})=u(\mathbf{X})=1$, I think we get $\mathbf{H}=\mathbf{A}$ and $\mathbf{H}=\mathbf{0}$, respectively, but I don't know how to handle the scalar functions when they depend on $\mathbf{X}$.

1

There are 1 best solutions below

0
On BEST ANSWER

Allow me to use vectors to denote the vectorized matrices, e.g. $$\eqalign{ q &= {\rm vec}(Q) \cr x &= {\rm vec}(X) \cr &\ldots {\rm etc} \ldots }$$ Now write the differential in terms of the Frobenius (:) product $$\eqalign{ df &= t\,(x^TA)\,dx \cr &= t\,(A^Tx):dx \cr }$$ So the gradient is $$\eqalign{ g &= \frac{\partial f}{\partial x} &= t\,(A^Tx) \cr }$$ To find the Hessian, begin with the differential of the gradient $$\eqalign{ dg &= dt\,(A^Tx) + t\,(A^Tdx) \cr &= (q^Tdx)\,(A^Tx) + (t\,A^T)dx \cr &= [(A^Tx)q^T + (t\,A^T)]\,dx \cr &= A^T(xq^T + tI)\,dx \cr }$$ Yielding the Hessian as $$\eqalign{ H &= \frac{\partial g}{\partial x} &= A^T(xq^T + tI) \cr\cr }$$ Your second question can be answered in the same way.