Suppose we have some function $f:{\mathbb R}^n \rightarrow \mathbb{R}$. Its gradient is defined as the vector which gives the directional derivative via $(v,\nabla f )=D_{v}f$ for any direction $v$.
Could, or should, we think of $\nabla f$ as something belonging to the dual space of the domain of $f$? And if yes, what is the idea of going about this in this way? In particular are there some geometric ideas involved?
I ran into this idea while learning about subgradients and generalised subgradients, which are defined as functionals on the space of the domain of $f$.
The best way to think about the derivative is this: $$ \tag{1} f(x) \approx f(x_0) + f'(x_0)(x - x_0). $$ The linear function on the right provides a simple but accurate approximation of $f$ near $x_0$. If $f:\mathbb R \to \mathbb R$, then $f'(x_0)$ is a real number. If $f:\mathbb R^n \to \mathbb R^m$, then $f'(x_0)$ is an $m \times n$ matrix.
In the special case that $m = 1$, $f'(x_0)$ is a row vector. If we use the convention (common in optimization) that $\nabla f(x_0)$ is a column vector, then we have $$ \nabla f(x_0) = f'(x_0)^T. $$
If we prefer to work with linear transformations rather than matrices, then we may choose to define the derivative to be a linear transformation (often denoted $Df(x_0)$) rather than a matrix. In this approach, equation (1) becomes $$ \tag{1} f(x) \approx f(x_0) + Df(x_0)(x - x_0). $$ When $m = 1$, $Df(x_0)$ is the linear functional that maps a vector $\Delta x$ to $\langle \nabla f(x_0), \Delta x \rangle$.