A relation between gradient, Hessian, and Jacobian

88 Views Asked by At

I'm trying to clarify the relation between gradient, Hessian, and Jacobian. Could you check if my below understanding is fine? Thank you so much!


Let $X := \mathbb R^d$ and $\langle \cdot, \cdot \rangle$ be its canonical inner product. Let $X^*$ be the dual space of $X$, $\mathcal L_{\text{sym}}^2(X, \mathbb R)$ the space of all symmetric bilinear maps from $X \times X$ to $\mathbb R$, and $\mathcal L(X)$ the space of all linear maps from $X$ to $X$. Let $f:X \to \mathbb R$ be twice differentiable at $a \in X$.

  1. We have $\partial f (a) \in X^*$. By Riesz representation theorem, there is a unique element in $X$ denoted by $\nabla f (a)$ such that $$ \langle \nabla f (a), x \rangle = \partial f (a) [x] \quad \forall x \in X. $$ Here we use the square bracket $[x]$ to denote the argument of $\partial f (a)$. Then $\nabla f (a)$ is called the gradient of $f$ at $a$. We have $$ \nabla f(a)=\left[\begin{array}{c} \partial_1 f(a)\\ \vdots \\ \partial_d f(a) \end{array}\right], $$ where $\partial_k f(a)$ is the partial derivative of $f$ at $a$ w.r.t. the $k$-th coordinate. We also have $\partial_k f(a) = \partial f (a)[e_k]$ where $(e_k)_{k=1}^d$ is the standard basis of $X$.

  2. We have $\partial^2 f (a) \in \mathcal L_{\text{sym}}^2(X, \mathbb R)$. Then there is a unique self-adjoint $T \in \mathcal L(X)$ such that $$ \partial^2 f (a)[x, y] = \langle Tx, y \rangle \quad \forall x, y\in X. $$ The operator $T$ can be represented uniquely (in the standard basis) as a square symmetric matrix $H_f (a)$, which is called the Hessian of $f$ at $a$. We have $$ H_f (a) := \left[\begin{array}{cccc} \partial^2_1 f(a) & \partial_1 \partial_2 f(a) & \cdots & \partial_1 \partial_d f(a) \\ \partial_2 \partial_1 f(a) & \partial_2^2 f(a) & \cdots & \partial_2 \partial_d f(a) \\ \vdots & \vdots & \ddots & \vdots \\ \partial_d \partial_1 f(a) & \partial_d \partial_2 f(a) & \cdots & \partial^2_d f(a) \end{array}\right]. $$

  3. It follows from $f$ is twice differentiable at $a$ that $\nabla f:X \to X$ is differentiable at $a$. Let $J_{\nabla f} (a)$ be the Jacobian of $\nabla f$ at $a$. So $J_{\nabla f} (a)$ is the unique square $d \times d$ matrix represented $\partial (\nabla f) (a) \in \mathcal L(X)$ (in the standard basis). We have $$ J_{\nabla f} (a) = \left[\begin{array}{cccc} \partial_1 (\partial_1 f)(a) & \partial_2 (\partial_1 f)(a) & \cdots & \partial_d (\partial_1 f)(a) \\ \partial_1 (\partial_2 f)(a) & \partial_2(\partial_2 f)(a) & \cdots & \partial_d (\partial_2 f)(a) \\ \vdots & \vdots & \ddots & \vdots \\ \partial_1 (\partial_d f)(a) & \partial_2 (\partial_d f)(a) & \cdots & \partial_d( \partial_d f)(a) \end{array}\right]. $$ It follows that $J_{\nabla f} (a) = (H_f (a))^\top$ and thus $J_{\nabla f} (a) = H_f (a)$.