The connection between the Jacobian, Hessian and the gradient?

Question

The connection between the Jacobian, Hessian and the gradient?

37.2k Views Asked by Bumbble Comm At 26 Mar 2026 - 9:54

In this Wikipedia article they have this to say about the gradient:

If $m = 1$, $\mathbf{f}$ is a scalar field and the Jacobian matrix is reduced to a row vector of partial derivatives of $\mathbf{f}$—i.e. the gradient of $\mathbf{f}$.

As well as

The Jacobian of the gradient of a scalar function of several variables has a special name: the Hessian matrix, which in a sense is the "second derivative" of the function in question.

So I tried doing the calculations, and was stumped.

If we let $f: \mathbb{R}^n \to \mathbb{R}$, then $$Df = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \dots & \frac{\partial f}{\partial x_n} \end{bmatrix} = \nabla f$$ So far so good, but when I try to calculate the Jacobian matrix of the gradient I get $$D^2f = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_2 x_1} & \dots & \frac{\partial^2 f}{\partial x_n x_1} \\ \frac{\partial^2 f}{\partial x_1 x_2} & \frac{\partial^2 f}{\partial x_2^2} & \dots & \frac{\partial^2 f}{\partial x_n x_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_1 x_n} & \frac{\partial^2 f}{\partial x_2 x_n} & \dots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}$$ Which according to this article, is not equal to the Hessian matrix but rather its transpose, and from what I can gather the Hessian is not generally symmetric.

So I have two questions, is the gradient generally thought of as a row vector? And did I do something wrong when I calculated the Jacobian of the gradient of $f$, or is the Wikipedia article incorrect?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 22 Nov 2021 - 2:16

In A.4.1 of B & V's Convex Optimization book, given a scalar function $f(x)$ with $x\in \mathbf{R}^n$, the transpose of the derivative (or Jacobian) of $f$ is called the gradient of this function:

$$ \nabla f(x)=Df(x)^T $$

where $\nabla f(x)$ is a column vector and $Df(x)$ is a row vector. In the machine learning (ML) community, $$H(f(x))=\nabla^2 f(x)=\nabla\nabla^T f(x)=\nabla Df(x)=(D\nabla f(x))^T$$ where the third equality is obtained in the notational sense based on the symbol $\nabla\nabla^T f$, which might not be rigorous but makes sense in some way when combining $(D\nabla f(x))^T$ the last equality (exactly the same as Scott Staniewicz's answer, i.e., $D\nabla f(x)^T$). In a numerical sense, both taking the gradient on the derivative and transposing the derivative of the gradient yield the Hessian matrix.

Note that the notation $\nabla\nabla^T f$ is correct and standard. Technically speaking, $\nabla^2 f$ is not a standard notation in a mathematical sense, although it is commonly used in the ML community. @Scott Staniewicz

Please do not take the third and the fourth equalities seriously. They are just another ways to connect Jacobian, Hessian, and the gradient.

Bumbble Comm On 12 Jan 2023 - 6:15

let us start one by one. Following the numerator layout convention, the gradient of $f(x): \mathbf{R}^n \rightarrow \mathbf{R}$ with respect to $x$ is a column vector as follow $$ \nabla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1}\\ \frac{\partial f}{\partial x_2}\\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix} \in \mathbf{R}^n $$

The Hessian is the second-order derivative with respect to $x$ and its a square matrix and can be summarised as $H f(x)_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}$ where $i$ is the row and $j$ is the column. The Hessian matrix is $$ H_f(x) = \nabla^2 f(x) = \begin{bmatrix} \frac{\partial^2 f}{\partial x^2_1} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n}\\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x^2_2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n}\\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x^2_n} \end{bmatrix} \in \mathbf{R}^{n \times n} $$

I would suggest having a look at the Appendix D of this book Convex Optimisation, Dattorro.

Now, regarding the relation between Gradinet, Jacobian, and Hessain here is a summary based on the same numerator layout convention.

Gradient is the transpose of Jacobian, i.e. $\nabla f = J^T f$.
Hessian is the derivative of the gradient, i.e. $H f = J(\nabla f)$.

Lets try the $J(\nabla f)$ on the first item of the gradient $\frac{\partial f}{\partial x_1}$ in which the Jacobian is in fact the partial derivative $\frac{\partial f}{\partial x}$ and it is a row vector

$$ \frac{\partial f}{\partial x}\left ( \frac{\partial f}{\partial x_1} \right ) = \begin{bmatrix} \frac{\partial f}{\partial x_1}\left ( \frac{\partial f}{\partial x_1} \right ) & \frac{\partial f}{\partial x_2}\left ( \frac{\partial f}{\partial x_1} \right ) & \cdots & \frac{\partial f}{\partial x_n}\left ( \frac{\partial f}{\partial x_1} \right ) \end{bmatrix} \in \mathbf{R}^{1 \times n} $$ which is matching the first row of the Hessian matrix above.

Just remember that $\frac{\partial^2 f}{\partial x_1 \partial x_2} = \frac{\partial \left ( \frac{\partial f}{\partial x_1} \right )}{\partial x_2} = \frac{\partial \left ( \frac{\partial f}{\partial x_2} \right )}{\partial x_1} = \frac{\partial^2 f}{\partial x_2 \partial x_1}$.

A proof of the Hessian relation can bee seen in section A.4.3 of B&V convex optimisation book, the authors stated that "the gradient mapping is the function $\nabla f: \mathbf{R}^n \rightarrow \mathbf{R}$, with $\mathbf{dom} \nabla f = \mathbf{dom} f$, with value $\nabla f(x)$ at $x$. The derivative of this mapping is $D \nabla f(x) = \nabla^2 f(x)$"

So, as per the authors' words, the Hessian = Jacobian (gradient f(x)) as per the book convention which I think it is the numerator layout convention.

**Bumbble Comm** · Accepted Answer

You did not do anything wrong in your calculation. If you directly compute the Jacobian of the gradient of $f$ with the conventions you used, you will end up with the transpose of the Hessian. This is noted more clearly in the introduction to the Hessian on Wikipedia (https://en.wikipedia.org/wiki/Hessian_matrix) where it says

The Hessian matrix can be considered related to the Jacobian matrix by $\mathbf{H}(f(\mathbf{x})) = \mathbf{J}(∇f(\mathbf{x}))^T$.

The other Wikipedia article should probably update the language to match accordingly.

As for the gradient of $f$ is being defined as a row vector, that is the way I have seen it more often, but it is noted https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions that there are competing conventions for general matrix derivatives. However, I don't think that should change your answer for the Hessian- with the conventions you are using, you are correct that it should be transposed.

The connection between the Jacobian, Hessian and the gradient?

There are 3 best solutions below

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in JACOBIAN

Related Questions in HESSIAN-MATRIX

Trending Questions

Popular # Hahtags

Popular Questions