In our lecture on continuous optimization there is a lot of operator-overloading when it comes to the $\nabla$-operator.
I originally know it as the gradient that, for a function $f: \mathbb{R}^n \to \mathbb{R}$ was defined like this:
$$\nabla f(x) := \pmatrix{\frac{\delta f(x)}{\delta x_1} \\ \vdots \\ \frac{\delta f(x)}{\delta x_n}}$$
or written differently
$$\nabla f(x) := \sum_{i=1}^n e_i \frac{\delta f(x)}{\delta x_i}$$
where $e_i$ is the i-th unit vector in $\mathbb{R}^n$. Now my problem arises since we often write $\nabla^2 f$ and similarly we write $\nabla h$ for a function $f: \mathbb{R}^n \to \mathbb{R}^p$ where we mean the Hessian and the Jakobi Matrix respectively.
My point of confusion is the question whether we just use lazy notation that is not strictly accurate, or whether I fail to understand how this operator works. By the definition above and the linearity of the derivate I would have
$$\nabla^2 f(x) = (\nabla \circ \nabla) f(x) = \nabla (\nabla f(x)) = \nabla \sum_{i=1}^n e_i \frac{\delta f(x)}{\delta x_i} = \sum_{i=1}^n e_i \nabla \frac{\delta f(x)}{\delta x_i}$$
and since also $\frac{\delta f(x)}{\delta x_i} : \mathbb{R}^n \to \mathbb{R}$ I should by the same logic get something like
$$\sum_{i=1}^n e_i \nabla \frac{\delta f(x)}{\delta x_i} = \sum_{i=1}^n e_i \sum_{j=1}^n e_j \frac{\delta^2 f(x)}{\delta x_j \delta x_i}$$
To me this seems more like column vectors whose entries are themselves column vectors instead of the Hessian Matrix.
Edit: It has been pointed out here that what the $\nabla^2$ notation likely refers to the dyadic/outer product of the $\nabla$-'vector'. This begs the question though if this notation can be rigorously justified/derived i.e. whether one can show that $(\nabla \circ \nabla)f = (\nabla \nabla^\top)f$
First, I see a typo in $$\tag{1} \nabla^2 f(x) = \nabla (\nabla f(x)) = \color{red}{\nabla}\sum_{i=1}^n e_i \frac{\delta f(x)}{\delta x_i}\,. $$ I find it hard to understand the expression $$ \sum_{i=1}^n e_i \nabla \frac{\delta f(x)}{\delta x_i} $$ because (1) is the scalar product of the vector $\nabla$ and the vector $\nabla f(x)\,.$ This leads to $$ \nabla^2f(x)=\sum_{i=1}^n\frac{\delta^2}{\delta x_i^2}f(x)=\Delta f(x)\,. $$