Hessian of composite function

Question

Hessian of composite function

3.1k Views Asked by Bumbble Comm At 28 Mar 2026 - 11:42

Let $\cal{J}(y) : \mathbb{R}^n \rightarrow \mathbb{R}$.

Furthermore, suppose $y := h(x)$ with the nonlinear mapping $h : \mathbb{R}^n \rightarrow \mathbb{R}^n$ twice differentiable. The Jacobian of $h$ is diagonal (i.e., $y_i = h_i(x_i)$ for all $i = 1 \ldots n$).

I want to calculate the second directional derivative of $J$ with respect to the variable $y$ along an arbitrary direction $v \in \mathbb{R}^n$ - as a function of the second derivative of $J$ wrt $x$ (along $v$) and that of the mapping $h$.

The chain rule gives:

$$ \frac{\partial \cal J}{\partial x} = \frac{\partial \cal J}{\partial y} \frac{\partial y}{\partial x}. $$

Here I'm following the standard convention (gradients = "row" vectors).

Then, I calculate the directional derivative of the (transposed) equation above - and things become a bit confusing...

$$ \frac{\partial \cal^2 J}{\partial x^2} \cdot v = \left( \frac{\partial y}{\partial x} \right)^T \left( \frac{\partial \cal^2 J}{\partial y^2} \right) \left( \frac{\partial y}{\partial x} \right) \cdot v + \left( \frac{\partial^2 y}{\partial x^2} \otimes v \right)^T \left( \frac{\partial \cal J}{\partial y} \right)^T, $$ where the $\otimes$ denotes the tensor-vector product operation.

Furthermore, suppose the Jacobian $\displaystyle \frac{\partial y}{\partial x}$ is non-singular over $\mathbb R^n$, and define $$ \widehat{v} = \frac{\partial y}{\partial x} \, v. $$

Then: $$ \frac{\partial \cal^2 J}{\partial y^2} \cdot \widehat{v} = \left( \frac{\partial y}{\partial x} \right)^{-T} \frac{\partial \cal^2 J}{\partial x^2} \left( \frac{\partial y}{\partial x} \right)^{-1} \widehat{v} - \left( \frac{\partial y}{\partial x} \right)^{-T} \left( \frac{\partial^2 y}{\partial x^2} \otimes \left( \frac{\partial y}{\partial x}\right)^{-1} \widehat{v} \right)^T \left( \frac{\partial \cal J}{\partial y} \right)^T. $$

Does this look OK? The matrix dimensions seem to match up...

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I hope this answers your question, to some extent at least. Keeping track of all the matrices can be a nuisance, and I find it easier to operate with polynomials.

It is good to view this in terms of Taylor polynomials. We have $$ h(x+a) = h(x)+(Dh(x))a+\frac12(D^2h(x))(a,a)+O(|a|^3) $$ where $D^2h(x)$ is a symmetric map $\mathbb R^n\times\mathbb R^n\to\mathbb R^n$. Similarly $$ J(y+b) = J(y)+(DJ(y))b+\frac12(D^2J(y))(b,b)+O(|b|^3). $$ Note that $h$ takes values in $\mathbb R^n$ and $J$ takes values in $\mathbb R$ although the expressions look the same. Putting these together yields $$ \begin{split} J(h(x+a)) &= J(h(x)+(Dh(x))a+\frac12(D^2h(x))(a,a)+O(|a|^3)) \\&= J(h(x))+(DJ(h(x)))[(Dh(x))a+\frac12(D^2h(x))(a,a)]+\frac12(D^2J(h(x)))((Dh(x))a,(Dh(x))a)+O(|a|^3). \end{split} $$ But we should have $$ J(h(x+a)) = J\circ h(x)+(D(J\circ h)(x))a+\frac12(D^2(J\circ h)(x))(a,a)+O(|a|^3). $$ These two polynomial expressions (ignoring the error term) must be equal for all $a$. Comparing the terms gives first the chain rule $$ (D(J\circ h)(x)) = (DJ(h(x)))(Dh(x)) $$ and then for the second derivative $$ (D^2(J\circ h)(x))(a,a) = (DJ(h(x)))(D^2h(x))(a,a)+(D^2J(h(x)))((Dh(x))a,(Dh(x))a). $$ If $H_J(y)$ is the Hessian matrix of $J$ at $y$, then $(D^2J(y))(b,b)=b^TH_J(y)b$. The second term of $D^2(J\circ h)$ corresponds to the matrix $Dh(x)^TH_J(h(x))Dh(x)$, but the first one cannot be written as neatly in matrix form. With indices, we have $$ (D^2(J\circ h)(x))(a,a) = \sum_{i,j,k}\partial_iJ(h(x))\partial_j\partial_kh_i(x)a_ja_k + \sum_{i,j,k,l}\partial_ih_j(x)\partial_j\partial_kJ(h(x))\partial_kh_l(x)a_ia_l . $$

Hessian of composite function

There are 1 best solutions below

Related Questions in REAL-ANALYSIS

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in DERIVATIVES

Related Questions in IMPLICIT-DIFFERENTIATION

Related Questions in IMPLICIT-FUNCTION-THEOREM

Trending Questions

Popular # Hahtags

Popular Questions