Let $\Theta$ be a compact $d$-dimensional Riemannian manifold without boundary and $M(\Theta)$ (resp. $M_+(\Theta)$) denote the set of signed (resp. nonnegative) finite Borel measures on $\Theta$.
What is the Fréchet derivative of the total variation norm given below? $$ \| \cdot \|_{\text{TV}} \colon M(\Theta) \to \mathbb{R}_{\ge 0}, \qquad \mu \mapsto \| \mu \|_{\text{TV}} $$ Is it even differentiable in $\mu = 0$?
Context: in the paper L. Chizat - Sparse Optimization on Measures with Over-parametrized Gradient Descent the following setting (A1) is considered: Let $F$ be a Hilbert space and $\phi \colon \Theta \to F$ and $R \colon F \to \mathbb{R}$ each be twice Fréchet differentiable with locally Lipschitz second order derivatives such that $\nabla R$ is bounded on sublevel sets. (Does this mean on the sublevel sets of $R$?)
Chizat claims on page 5 that
the objective \begin{equation*} J \colon M_+(\Theta) \to \mathbb R, \qquad \nu \mapsto R\left(\int_{\Theta} \phi(\theta) \text{d}\nu(\theta)\right) + \lambda \| \nu \|_{\text{TV}}, \end{equation*} which can easily be extended to $M(\Theta)$ (see Appendix A in that paper, which is also available on arXiv), is Fréchet differentiable and its differential at $\nu \in M(\Theta)$ can be represented by $$ J^{'}_{\nu} \colon \Theta \to \mathbb{R}, \qquad \theta \mapsto \left\langle \phi(\theta), \nabla R\left(\int_{\Theta} \phi(\theta) \text{d}\nu(\theta)\right) \right\rangle_{F} + \lambda $$ in the sense that $\frac{d}{d \varepsilon} J(\nu + \varepsilon \sigma) \bigg|_{\varepsilon = 0} = \int_{\Theta} J_{\nu}^{'}(\theta) \text{d}\sigma(\theta)$.
Using that the Fréchet derivative is linear and just focussing on the second term (with $\lambda$), this would imply that $D \| \cdot \|_{\text{TV}}(\nu)[\sigma] = \| \sigma \|_{\text{TV}}$, where $D f(x)[h] \in Y$ is the Fréchet derivative of $f \colon X \to Y$ at $x \in X$ in direction $h \in X$.
We have that if $f$ is linear, then $D f(x)[h] = f(h)$ for all $x, h \in X$. Does the converse also hold? If yes, this would imply that the total variation norm is linear, which is surely not true.
What you've written here is inconsistent with what I remembered from the paper. So I followed both links (Springer & arXiv). Your definition for $J$ here is not what's in the paper.
First, $J$ in page 5 of arXiv or 6 of Springer is defined on $M$ not $M_+$ ($M_+$ would be wrong here since it's not a vector space but $M$ is.) It's the optimization that is over $M_+$.
Second, the last term in $J$ is the total mass $\nu(\Theta)$ not $||\nu||$. The norm in a Banach space is not in general Fréchet differentiable. There it is differentiating $\nu\mapsto \nu(\Theta)$, not the total variation $\nu\mapsto ||\nu||$, w.r.t to $\nu$, which is 1. That is for $J(\nu)=\nu(\Theta)=\int_\Theta d\nu(\Theta)$, $J$ is linear on $M$ so its derivative is itself; $dJ_\nu(\sigma)=\sigma(\Theta)=\int_\Theta 1 d\sigma(\Theta)$, compare that with the definition of $J'_\nu(\Theta)$, you get $J'_\nu(\Theta)=1$.