The gradient of a function $f:E\to\mathbb{R}$ is Lipschitz continuous with parameter $L > 0$ iff $$\|\nabla f(x) - \nabla f(y)\|^* \le L\|x-y\| \quad \forall x,y\in E.$$
I have two questions:
- Why is the gradient in the dual space?
- Since the dual norm of $\ell_p$ is $\ell_q$ with $\frac{1}{p}+\frac{1}{q}=1$ and the dual norm of $\ell_1$ is $\ell_\infty$, we have: \begin{align} \|\nabla f(x) - \nabla f(y)\|_\infty &\le L\|x-y\|_1 \\ \|\nabla f(x) - \nabla f(y)\|_{\frac{p}{p-1}} &\le L\|x-y\|_p \quad \forall p>1. \end{align} Is this correct?
Thank you in advance.
The gradient of a function is a linear functional that gives an approximation to the function: $$ f(x+h) = f(x) + \color{blue}{\nabla f(x) h} + o(\|h\|), \quad \|h\|\to0 $$ (I use Fréchet gradient definition here.) In scalar case, $\nabla f(x) h$ is just $f'(x)h$, multiplication of two numbers. In $\mathbb R^n$, it is $\nabla f(x) \cdot h$, inner product. But when $h$ is in a Banach space $E$, a linear functional is an element of the dual space; thus, $\nabla f(x) $ can only be understood as an element of $E^*$.
One could say that $\nabla f(x)$ was always in the dual space, we just did not notice it before because the dual of $\mathbb R^n$ is habitually identified with $\mathbb R^n$ itself.
The formulas you wrote for the special case $E=\ell_p$ are correct.
As an exercise, you may want to find $\nabla f(x)$ explicitly for $f(x)=\|x\|$ and check that the stated inequality indeed holds.
For example, if $p=1$, then the gradient of the norm at $x=(x_i)$ is $(\operatorname{sign}x_i)$, assuming none of $x_i$ are zero. (Otherwise, the norm is not differentiable at $x$.) As you can see, this gradient is a vector of $\pm 1$, so it makes sense that it lives in $\ell_\infty$.