I'm trying to clarify the relation between gradient, Hessian, and Jacobian. Could you check if my below understanding is fine? Thank you so much!
Let $X := \mathbb R^d$ and $\langle \cdot, \cdot \rangle$ be its canonical inner product. Let $X^*$ be the dual space of $X$, $\mathcal L_{\text{sym}}^2(X, \mathbb R)$ the space of all symmetric bilinear maps from $X \times X$ to $\mathbb R$, and $\mathcal L(X)$ the space of all linear maps from $X$ to $X$. Let $f:X \to \mathbb R$ be twice differentiable at $a \in X$.
We have $\partial f (a) \in X^*$. By Riesz representation theorem, there is a unique element in $X$ denoted by $\nabla f (a)$ such that $$ \langle \nabla f (a), x \rangle = \partial f (a) [x] \quad \forall x \in X. $$ Here we use the square bracket $[x]$ to denote the argument of $\partial f (a)$. Then $\nabla f (a)$ is called the gradient of $f$ at $a$. We have $$ \nabla f(a)=\left[\begin{array}{c} \partial_1 f(a)\\ \vdots \\ \partial_d f(a) \end{array}\right], $$ where $\partial_k f(a)$ is the partial derivative of $f$ at $a$ w.r.t. the $k$-th coordinate. We also have $\partial_k f(a) = \partial f (a)[e_k]$ where $(e_k)_{k=1}^d$ is the standard basis of $X$.
We have $\partial^2 f (a) \in \mathcal L_{\text{sym}}^2(X, \mathbb R)$. Then there is a unique self-adjoint $T \in \mathcal L(X)$ such that $$ \partial^2 f (a)[x, y] = \langle Tx, y \rangle \quad \forall x, y\in X. $$ The operator $T$ can be represented uniquely (in the standard basis) as a square symmetric matrix $H_f (a)$, which is called the Hessian of $f$ at $a$. We have $$ H_f (a) := \left[\begin{array}{cccc} \partial^2_1 f(a) & \partial_1 \partial_2 f(a) & \cdots & \partial_1 \partial_d f(a) \\ \partial_2 \partial_1 f(a) & \partial_2^2 f(a) & \cdots & \partial_2 \partial_d f(a) \\ \vdots & \vdots & \ddots & \vdots \\ \partial_d \partial_1 f(a) & \partial_d \partial_2 f(a) & \cdots & \partial^2_d f(a) \end{array}\right]. $$
It follows from $f$ is twice differentiable at $a$ that $\nabla f:X \to X$ is differentiable at $a$. Let $J_{\nabla f} (a)$ be the Jacobian of $\nabla f$ at $a$. So $J_{\nabla f} (a)$ is the unique square $d \times d$ matrix represented $\partial (\nabla f) (a) \in \mathcal L(X)$ (in the standard basis). We have $$ J_{\nabla f} (a) = \left[\begin{array}{cccc} \partial_1 (\partial_1 f)(a) & \partial_2 (\partial_1 f)(a) & \cdots & \partial_d (\partial_1 f)(a) \\ \partial_1 (\partial_2 f)(a) & \partial_2(\partial_2 f)(a) & \cdots & \partial_d (\partial_2 f)(a) \\ \vdots & \vdots & \ddots & \vdots \\ \partial_1 (\partial_d f)(a) & \partial_2 (\partial_d f)(a) & \cdots & \partial_d( \partial_d f)(a) \end{array}\right]. $$ It follows that $J_{\nabla f} (a) = (H_f (a))^\top$ and thus $J_{\nabla f} (a) = H_f (a)$.