On "the Hessian is the Jacobian of the gradient"

1.8k Views Asked by At

According to Wikipedia,

The Hessian matrix of a function $f$ is the Jacobian matrix of the gradient of the function $f$; that is: $H(f(x)) = J(\nabla f(x))$.

Suppose $f : \Bbb R^m \to \Bbb R^n,x \mapsto f(x)$ and $f \in C^2 (\Bbb R^m)$. Here, I regard points in $\Bbb R^m, \Bbb R^n$ as column vectors, therefore $f$ sends column vectors to column vectors. When $n=1$, we can define $\nabla f: \Bbb R^m \to (\Bbb R^m)^t,x\mapsto\nabla f(x)$, which sends column vectors to row vectors. I use $(\Bbb R^m)^t$ to denote row vector space, which is just a random notation.

We do have a good definition for functions that sends column vectors to column vectors, but what can we say about functions that sends column vectors to row vectors?

I discovered that if I manipulate $\nabla f(x)$ as a column vector, then I know how to calculate, and my calculation agree with Wiki. But I don't think we can "manipulate $\nabla f(x)$ as a column vector".

1

There are 1 best solutions below

2
On

The relationship between gradients and Jacobians is the transpose. Suppose $f:\mathbb{R}^n\to\mathbb{R}$ is some $C^1$ function.

$J_f$ takes a point of $\mathbb{R}^n$ and produces a $1\times n$ matrix that is able to calculate directional derivatives. That is to say, if $\vec{v}\in\mathbb{R}^n$ is a vector and $p\in\mathbb{R}^n$ is a point, then $J_f(p)\vec{v}$ is a $1\times 1$ matrix whose entry is the directional derivative at $p$ in the $\vec{v}$ direction.

$\nabla f$ takes a point of $\mathbb{R}^n$ and produces a vector in $\mathbb{R}^n$ that can be used to calculate directional derivatives using the dot product. That is, with the same $p$ and $\vec{v}$ above, $\vec{v}\cdot\nabla f(p)$ is the directional derivative at $p$ in the $\vec{v}$ direction.

Recall that if $\vec{v},\vec{w}\in\mathbb{R}^n$ are vectors, then $\vec{v}\cdot\vec{w}=\vec{v}^T\vec{w}$, where we pretend that $1\times 1$ matrices are the same as scalars for this equation to make sense. Then, we have that $\vec{v}\cdot\nabla f(p) = (\nabla f(p))^T\vec{v}$, where the latter is commonly written as $(\nabla^Tf(p))\vec{v}$. Since this represents the directional derivative, too, and it holds for all $\vec{v}$ and $p$, then $\nabla^Tf=J_f$.


Let's take a look at the equation $H_f=J_{\nabla f}$ for the Hessian, where $f:\mathbb{R}^n\to\mathbb{R}$ is a $C^2$ function. We have that $\nabla f$ is a function $\mathbb{R}^n\to\mathbb{R}^n$, taking points to column vectors, and so $J_{\nabla f}$ is going to be an $n\times n$ matrix. Using that $(\nabla f(p))_i = \tfrac{\partial f}{\partial x_i}(p)$, then $$(H_f(p))_{ij} = (J_{\nabla f}(p))_{ij} = \frac{\partial}{\partial x_j}(\nabla f(p))_i = \tfrac{\partial^2f}{\partial x_j\partial x_i}(p),$$ as expected.