Can one derive the Hessian as a composition $\nabla \circ \nabla$?

273 Views Asked by At

In our lecture on continuous optimization there is a lot of operator-overloading when it comes to the $\nabla$-operator.

I originally know it as the gradient that, for a function $f: \mathbb{R}^n \to \mathbb{R}$ was defined like this:

$$\nabla f(x) := \pmatrix{\frac{\delta f(x)}{\delta x_1} \\ \vdots \\ \frac{\delta f(x)}{\delta x_n}}$$

or written differently

$$\nabla f(x) := \sum_{i=1}^n e_i \frac{\delta f(x)}{\delta x_i}$$

where $e_i$ is the i-th unit vector in $\mathbb{R}^n$. Now my problem arises since we often write $\nabla^2 f$ and similarly we write $\nabla h$ for a function $f: \mathbb{R}^n \to \mathbb{R}^p$ where we mean the Hessian and the Jakobi Matrix respectively.

My point of confusion is the question whether we just use lazy notation that is not strictly accurate, or whether I fail to understand how this operator works. By the definition above and the linearity of the derivate I would have

$$\nabla^2 f(x) = (\nabla \circ \nabla) f(x) = \nabla (\nabla f(x)) = \nabla \sum_{i=1}^n e_i \frac{\delta f(x)}{\delta x_i} = \sum_{i=1}^n e_i \nabla \frac{\delta f(x)}{\delta x_i}$$

and since also $\frac{\delta f(x)}{\delta x_i} : \mathbb{R}^n \to \mathbb{R}$ I should by the same logic get something like

$$\sum_{i=1}^n e_i \nabla \frac{\delta f(x)}{\delta x_i} = \sum_{i=1}^n e_i \sum_{j=1}^n e_j \frac{\delta^2 f(x)}{\delta x_j \delta x_i}$$

To me this seems more like column vectors whose entries are themselves column vectors instead of the Hessian Matrix.

Edit: It has been pointed out here that what the $\nabla^2$ notation likely refers to the dyadic/outer product of the $\nabla$-'vector'. This begs the question though if this notation can be rigorously justified/derived i.e. whether one can show that $(\nabla \circ \nabla)f = (\nabla \nabla^\top)f$

3

There are 3 best solutions below

1
On

First, I see a typo in $$\tag{1} \nabla^2 f(x) = \nabla (\nabla f(x)) = \color{red}{\nabla}\sum_{i=1}^n e_i \frac{\delta f(x)}{\delta x_i}\,. $$ I find it hard to understand the expression $$ \sum_{i=1}^n e_i \nabla \frac{\delta f(x)}{\delta x_i} $$ because (1) is the scalar product of the vector $\nabla$ and the vector $\nabla f(x)\,.$ This leads to $$ \nabla^2f(x)=\sum_{i=1}^n\frac{\delta^2}{\delta x_i^2}f(x)=\Delta f(x)\,. $$

2
On

We usually mean the following by $\nabla^2$ $$ \nabla^2 = ( \partial_x^2 + \partial_y^2)$$

The twice composition of the gradient :$\nabla \circ \nabla$ operator actually has a different meaning of it's own when evaluated correctly on a scalar field. It relates to the matrix which transforms displacements in the input plane into how the gradient vector varies.

3
On

Maybe you can see the del operator as $$ \nabla = \begin{pmatrix} \frac{\partial}{\partial x_1} \\ \vdots \\ \frac{\partial}{\partial x_N} \end{pmatrix} $$ and simply interpret the Hessian as the outer product $$ \mathbf{H}= \nabla \nabla^T $$ Thus $$ \mathbf{Hf}= \nabla \nabla^Tf= \begin{pmatrix} \frac{\partial}{\partial x_1} \\ \vdots \\ \frac{\partial}{\partial x_N} \end{pmatrix} \begin{pmatrix} \frac{\partial f}{\partial x_1} & \ldots & \frac{\partial f}{\partial x_N} \end{pmatrix} = \begin{pmatrix} \frac{\partial^2 f}{\partial x_1^2} & \ldots & \frac{\partial^2 f}{\partial x_1 \partial x_N} \\ \vdots \\ \frac{\partial^2 f}{\partial x_N \partial x_1} & \ldots & \frac{\partial^2 f}{\partial x_N^2} \\ \end{pmatrix} $$

UPDATE Let us write $\nabla^T f(\mathbf{x})=\sum_n \frac{\partial f}{\partial x_n} \mathbf{e}_n^T$

Thus $$\mathbf{Hf} =\sum_n \nabla \left(\frac{\partial f}{\partial x_n} \right) \mathbf{e}_n^T =\sum_n \sum_m \frac{\partial}{\partial x_m} \left(\frac{\partial f}{\partial x_n} \right) \mathbf{e}_m \mathbf{e}_n^T= \sum_{m,n} \frac{\partial^2 f}{\partial x_m \partial x_n} \mathbf{e}_m \mathbf{e}_n^T $$