In Professor Stephen Boyd's Convex Optimizaiton slide 3-18:
Suppose $g:\mathbb{R}^n\rightarrow\mathbb{R}^k$, $h:\mathbb{R}^k\rightarrow\mathbb{R}$,
$$f(x)=h(g(x))=h(g_1(x),\cdots,g_k(x)),$$
and $h, g$ differentiable, then
$$f''(x)=g'(x)^\mathrm{T}\nabla^2h(g(x))g'(x)+\nabla h(g(x))^\mathrm{T}g''(x).$$
I can only derive the first term (not sure if it's correct) as $J_g(x)^\mathrm{T}H_h(g)^\mathrm{T}J_g(x)$, where $J$ is Jacobian matrix and $H$ is Hessian matrix. I can't write the second term in a compact form.
Questions:
1. Is my derivation correct?
2. Is $g'(x)$ a matrix or a vector? What does the notation $g''(x)$ mean?
I know someone askes about this formula here but I don't understand the answer. Thanks in advance.
Here's how I would derive the expression in a coordinate-free way: let $df(x)$ be the differential of $f$ at $x$, i.e., the linear function mapping infinitesimal changes (tangent vectors) $\delta x$ of $x$ to infinitesimal changes of $f$.
The chain rule gives us that $$\left[d(h\circ g)\right]\delta x = \left[dh\left(g[x]\right)\right]\left[dg(x)\right]\delta x$$ i.e. the differential of $g$ pushes forward $\delta x$ to $\delta g$ at $g(x)$, which then gets pushed forward again by $dh$.
Now in your case $$df(\delta x) = \left[dh\left(g[x]\right)\right]\left[dg(x)\right]\delta x.$$
By the product rule, $d(fg)\delta x = (df \delta x) g + f(dg\delta x)$, so, denoting by $d^2f(\delta_1 x, \delta_2 x)$ the second differential in the $\delta_1 x$ and $\delta_2 x$ directions, $$d^2f(x) (\delta_1 x,\delta_2 x) = \left[d^2h(g[x])\right](dg(x)\delta_1 x, dg(x)\delta_2 x) + \left[dh(g[x])\right]\left[d^2g(x)\right](\delta_1 x, \delta_2 x).$$
Now some care is needed to interpret these objects. $d^2f$ is a bilinear function $\mathbb{R}^n\times\mathbb{R}^n\to \mathbb{R}$ which computes the second-order change in $f$ given two tangent directions $\delta_1 x$ and $\delta_2 x$ at $x\in\mathbb{R}^n$.
$d^2g(x)$ is a rank-three tensor: it bilinearly maps $\delta_1 x$ and $\delta_2x$ to a vector in $\mathbb{R}^k$ representing the second-order change in $g$. $dh$ is a matrix mapping an infinitesimal change $\delta y\in\mathbb{R}^k$ to the first-order change in $h$ (a real number). The composition thus is a bilinear map $\mathbb{R}^n\times\mathbb{R}^n\to \mathbb{R}$, as it must be to agree with $d^2f$.
In the first term, $d^2h$ is bilinear map $\mathbb{R}^n\times\mathbb{R}^n\to \mathbb{R}$ which computes the second-order change in $g$, given two tangent vectors in $\mathbb{R}^k$ supplied by the push-forward $dg$.
In terms of the Jacobian $J$ and Hessian $H$, we can write the above as
$$\delta_1 x^T Hf(x) \delta_2 x = \delta_1 x^T \left[Jg(x)\right]^T Hh(g[x]) Jg(x) \delta_2 x + Jh(g[x]) \left(\left[d^2g(x)\right](\delta_1 x, \delta_2 x)\right)$$
Here,
The first term agrees directly with the first term of the OP; in the second term, $g''$ must refer to the rank three second derivative of $g$, and is the part that is evaluated when $f''$ is "hit on both sides" by vectors, with the result then multiplied by $\nabla h^T$. I myself would not play so fast and loose with my notation.