Second derivative of composition of vector function

Question

Second derivative of composition of vector function

1.1k Views Asked by Bumbble Comm At 13 Apr 2026 - 9:45

In Professor Stephen Boyd's Convex Optimizaiton slide 3-18:
Suppose $g:\mathbb{R}^n\rightarrow\mathbb{R}^k$, $h:\mathbb{R}^k\rightarrow\mathbb{R}$, $$f(x)=h(g(x))=h(g_1(x),\cdots,g_k(x)),$$ and $h, g$ differentiable, then $$f''(x)=g'(x)^\mathrm{T}\nabla^2h(g(x))g'(x)+\nabla h(g(x))^\mathrm{T}g''(x).$$

I can only derive the first term (not sure if it's correct) as $J_g(x)^\mathrm{T}H_h(g)^\mathrm{T}J_g(x)$, where $J$ is Jacobian matrix and $H$ is Hessian matrix. I can't write the second term in a compact form.

Questions:
1. Is my derivation correct?
2. Is $g'(x)$ a matrix or a vector? What does the notation $g''(x)$ mean?

I know someone askes about this formula here but I don't understand the answer. Thanks in advance.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 12 Mar 2018 - 4:59

His notation here seems confusing. As you noted, we can take $f(\boldsymbol{x}) = h(\boldsymbol{g}(\boldsymbol{x})) = f(g_1(\boldsymbol{x}), \cdots, g_k(\boldsymbol{x}))$. Let's first find the gradient of $f(\boldsymbol{x})$ doing the following:

\begin{align} \frac{\partial f}{\partial x_m} &= \frac{\partial h}{\partial g_p} \frac{\partial g_p}{\partial x_m} \\ \end{align}

If interested, you can simplify this, leading to $\nabla_x f(\boldsymbol{x}) = \nabla_g h(\boldsymbol{g}(\boldsymbol{x})) \left(\frac{\partial \boldsymbol{g}}{\partial \boldsymbol{x}}\right)^T$. If we work from the above indicial gradient result, we can obtain the Hessian in the following manner:

\begin{align} \frac{\partial^2 f}{\partial x_m \partial x_n} &= \frac{\partial}{\partial x_n} \left \lbrace \frac{\partial h}{\partial g_p} \frac{\partial g_p}{\partial x_m} \right \rbrace \\ &= \frac{\partial}{\partial x_n} \left \lbrace \frac{\partial h}{\partial g_p} \right \rbrace \frac{\partial g_p}{\partial x_m} + \frac{\partial h}{\partial g_p} \frac{\partial}{\partial x_n} \left \lbrace \frac{\partial g_p}{\partial x_m} \right \rbrace \\ &= \frac{\partial^2 h}{\partial g_p \partial g_q} \frac{\partial g_q}{\partial x_n} \frac{\partial g_p}{\partial x_m} + \frac{\partial h}{\partial g_p} \frac{\partial^2 g_p}{\partial x_m \partial x_n}\\ \end{align}

We can simplify this result into the following:

\begin{align} \nabla^2_x f(\boldsymbol{x}) &= \left(\frac{\partial \boldsymbol{g}}{\partial \boldsymbol{x}}\right)^T \nabla^2_g h(\boldsymbol{g}(\boldsymbol{x})) \left(\frac{\partial \boldsymbol{g}}{\partial \boldsymbol{x}}\right) + \sum_{i=1}^k \frac{\partial h}{\partial g_i}(\boldsymbol{g}(\boldsymbol{x})) \nabla^2_x g_i(\boldsymbol{x}) \end{align}

It seems as though $g'(x)$ corresponds to the jacobian $\frac{\partial \boldsymbol{g}}{\partial \boldsymbol{x}}$. Seems as though $\boldsymbol{g}^{''}(\boldsymbol{x})$ corresponds, then, to the Hessian of each $g_k$ bundled together. Then, the product $\nabla h(\boldsymbol{g}(\boldsymbol{x}))^T \boldsymbol{g}^{''}(\boldsymbol{x})$ represents the summation I have in the second term.

**Bumbble Comm** · Accepted Answer

Here's how I would derive the expression in a coordinate-free way: let $df(x)$ be the differential of $f$ at $x$, i.e., the linear function mapping infinitesimal changes (tangent vectors) $\delta x$ of $x$ to infinitesimal changes of $f$.

The chain rule gives us that $$\left[d(h\circ g)\right]\delta x = \left[dh\left(g[x]\right)\right]\left[dg(x)\right]\delta x$$ i.e. the differential of $g$ pushes forward $\delta x$ to $\delta g$ at $g(x)$, which then gets pushed forward again by $dh$.

Now in your case $$df(\delta x) = \left[dh\left(g[x]\right)\right]\left[dg(x)\right]\delta x.$$

By the product rule, $d(fg)\delta x = (df \delta x) g + f(dg\delta x)$, so, denoting by $d^2f(\delta_1 x, \delta_2 x)$ the second differential in the $\delta_1 x$ and $\delta_2 x$ directions, $$d^2f(x) (\delta_1 x,\delta_2 x) = \left[d^2h(g[x])\right](dg(x)\delta_1 x, dg(x)\delta_2 x) + \left[dh(g[x])\right]\left[d^2g(x)\right](\delta_1 x, \delta_2 x).$$

Now some care is needed to interpret these objects. $d^2f$ is a bilinear function $\mathbb{R}^n\times\mathbb{R}^n\to \mathbb{R}$ which computes the second-order change in $f$ given two tangent directions $\delta_1 x$ and $\delta_2 x$ at $x\in\mathbb{R}^n$.

$d^2g(x)$ is a rank-three tensor: it bilinearly maps $\delta_1 x$ and $\delta_2x$ to a vector in $\mathbb{R}^k$ representing the second-order change in $g$. $dh$ is a matrix mapping an infinitesimal change $\delta y\in\mathbb{R}^k$ to the first-order change in $h$ (a real number). The composition thus is a bilinear map $\mathbb{R}^n\times\mathbb{R}^n\to \mathbb{R}$, as it must be to agree with $d^2f$.

In the first term, $d^2h$ is bilinear map $\mathbb{R}^n\times\mathbb{R}^n\to \mathbb{R}$ which computes the second-order change in $g$, given two tangent vectors in $\mathbb{R}^k$ supplied by the push-forward $dg$.

In terms of the Jacobian $J$ and Hessian $H$, we can write the above as

$$\delta_1 x^T Hf(x) \delta_2 x = \delta_1 x^T \left[Jg(x)\right]^T Hh(g[x]) Jg(x) \delta_2 x + Jh(g[x]) \left(\left[d^2g(x)\right](\delta_1 x, \delta_2 x)\right)$$

Here,

$Hf$ is an $n\times n$ matrix;
$Jg$ is a $k\times n$ matrix;
$Hh$ is a $k\times k$ matrix;
$Jh$ is a $1\times k$ row vector;
$d^2g$ is a bilinear function $\mathbb{R}^n\times \mathbb{R}^n\to\mathbb{R}^k$. You can think of this as the rank three "Hessian" of $g$, but it is symmetric in only two of its dimensions, so care needs to be taken to keep its indices straight.

The first term agrees directly with the first term of the OP; in the second term, $g''$ must refer to the rank three second derivative of $g$, and is the part that is evaluated when $f''$ is "hit on both sides" by vectors, with the result then multiplied by $\nabla h^T$. I myself would not play so fast and loose with my notation.

Second derivative of composition of vector function

There are 2 best solutions below

Related Questions in REAL-ANALYSIS

Related Questions in MULTIVARIABLE-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions