Consider an optimal control problem: Let $x_0\in\mathbb{R}^n$, $f\in C^{0,1}([0,T]\times (\mathbb{R}^n\times \mathbb{R}^m),E)$ be bounded, let the state equation be $$\dot{x}(t) = f(t,x(t),u(t)),\, t\in[0,1], \quad x(0) = x_0.$$ Let $x_u$ be the unique state associated to a control $u$ and let the cost functional $J$ be defined in terms of the final state only: $J\colon u\mapsto c(x_u(1))$ where $c\in C^1(\mathbb{R}^n,[0,\infty))$.
An author proceeded as follows: Consider the adjoint ODE $$-\lambda(t) = \partial_2f(t,x_u(t),u(t))^T\lambda(t),\, t\in[0,1],\quad \lambda(1) = \partial c(x_u(1)).$$ Then, for controls $v$, it would be $$\partial J(u)v = \int_0^1\lambda^T\partial_3f(t,x_u(t),u(t))v(t)dt,\quad t\in[0,1] \tag{1}$$ where $\partial J(u)$ refers to the Fréchet derivative of $J$ at $u$.
Can I prove this without just accepting some loose argument like "consider an isolated change at $u(t)$" or go full-on into distribution theory? If yes how? Perhaps some limit argument with small changes locally at $u(t)$?
W.r.t. to which norm on the space of controls do I need to assume $\partial J(u)$ to be computed. This is relevant because I'd like to identify $\lambda^T\partial_3f(t,x_u(t),u(t))$ as the gradient of $J$ in $u$.