I'm trying to understand proofs in the Neural ODE paper Appendix A, B.
And i'm stuck at the definitions. Equation (34) is introduced as "We will prove that if we define an adjoint state $a(t)=\frac{dL}{dz(t)}$ then ..."
Where $L$ is defined in equation (3): $L(\ z(t_1))=L(z(t_0)+\int_{t_0}^{t_1}f(z(t),t,\theta)dt\ )$ with scalar function $L$. And $z(t)$ satisfies $z'(t)=f(z(t),t,\theta)$.
Similarly the authors use $\frac{dT_{\epsilon} (z(t))}{dz(t)}$ where $T_\epsilon (z(t))=z(t+\epsilon)=z(t)+\int_{t}^{t+\epsilon}f(z(s),s,\theta)ds$ in (15)-(27).
But what are these derivatives $\frac{dT_{\epsilon} (z(t))}{dz(t)},\ \frac{dL}{dz(t)}$? How are they defined? They doesn't seem to be standard derivatives or Frechet derivatives of operators...
Upd:
Ok, it seems that $\frac{dL}{dz(t)}$ can be interpreted as just $L'(z(t))$, though i am still not sure. And i still don't understand what is $\frac{dz(t+\epsilon)}{dz(t)}$ and how to get this from the chain rule (Appendix B, eq. 38): $$\frac{dL}{\partial z(t)}=\frac{dL}{d z(t+\epsilon)}\frac{dz(t+\epsilon)}{dz(t)}$$
Upd 1:
From Neural ODE definition of derivative $\frac{d L}{dz(t)}$ (adjoint) it seems that $$\frac{dL}{dz(t)} \neq L'(z(t))$$
It is easier than you might think. It all comes from Pontryagin maximum principle and the theory of Lagrange multipliers. You can check a follow up work of the original neural ODEs for the formal derivation of the Adjoint equations 1.