Dual spaces and gradients and subgradients

479 Views Asked by At

Suppose we have some function $f:{\mathbb R}^n \rightarrow \mathbb{R}$. Its gradient is defined as the vector which gives the directional derivative via $(v,\nabla f )=D_{v}f$ for any direction $v$.

Could, or should, we think of $\nabla f$ as something belonging to the dual space of the domain of $f$? And if yes, what is the idea of going about this in this way? In particular are there some geometric ideas involved?

I ran into this idea while learning about subgradients and generalised subgradients, which are defined as functionals on the space of the domain of $f$.

3

There are 3 best solutions below

7
On

The best way to think about the derivative is this: $$ \tag{1} f(x) \approx f(x_0) + f'(x_0)(x - x_0). $$ The linear function on the right provides a simple but accurate approximation of $f$ near $x_0$. If $f:\mathbb R \to \mathbb R$, then $f'(x_0)$ is a real number. If $f:\mathbb R^n \to \mathbb R^m$, then $f'(x_0)$ is an $m \times n$ matrix.

In the special case that $m = 1$, $f'(x_0)$ is a row vector. If we use the convention (common in optimization) that $\nabla f(x_0)$ is a column vector, then we have $$ \nabla f(x_0) = f'(x_0)^T. $$

If we prefer to work with linear transformations rather than matrices, then we may choose to define the derivative to be a linear transformation (often denoted $Df(x_0)$) rather than a matrix. In this approach, equation (1) becomes $$ \tag{1} f(x) \approx f(x_0) + Df(x_0)(x - x_0). $$ When $m = 1$, $Df(x_0)$ is the linear functional that maps a vector $\Delta x$ to $\langle \nabla f(x_0), \Delta x \rangle$.

1
On

You are right to mention that $\nabla f$ is vector information coded in the dual.

On the level curves $f(x)=a$ for a constant value, we perform a composition $f\circ C:I\to\Bbb R^n\to\Bbb R$ with $f\circ C(t)=f(C(t))$, so you are going to get that for $x$ which are on the level curve $$f(C(t)=f(x)=a,$$ and $$\nabla f(C(t))\cdot C'(t)=0,$$ by the change rule. And this is the same for all $x\in f^{-1}(a)$.

So you can interpret that the gradient has the components of a vector, in each point in the level set, which is perpendicular to tangent $C'$ at each point $C(t)=x$, at each level curve.

3
On

As I answered this myself, please upvote if it makes sense!

We have

$f(x)=f(a)+(\nabla f,x-a) + $ Error $=f(a)+f_{x}(x_{1}-a_{1})+f_{y}(x_{2}-a_{2})$+ Error

Hence we can think of the functional as a functional generating the tangetplane by collecting all the directional derivatives.

Now the subgradients are functionals $g$ on the range such that

$f(x)\ge f(a) + g(x-a)=f(a) + y^t(x-a)$

where the second equality follows by Riesz theorem. We thus have functionals such that we are below or equal to the graph in any direction. These must, in analogy with the onedimensonal case be linear subspaces i.e planes for functions in two variables as any functional has as it coordiantes a normal to some plane.