gradient computation for neural ODEs

190 Views Asked by Bumbble Comm At 31 Mar 2026 - 10:23

I was reading the paper on Neural ODEs (here) and was wondering if anyone could offer some insight on calculation of the gradient of the loss function.

If we are only considering 2 time points, $t_0,t_1$, I understand how the adjoint method works. However, what confuses me is when the loss function involves multiple time points, say $t_0,t_1,t_2$.

The paper says (on p. 15) that the adjoint step for each intervals $[t_1,t_2]$ and $[t_0,t_1]$ can be performed and that the obtained gradients can be summed. I find this confusing as well as Figure 2 of the paper (page 2).

Using the paper's notation, I understand that $\mathbf{a}(t) = \frac{dL}{d \mathbf{z}(t)}$ and $\mathbf{a}_t(t) = \frac{dL}{dt(t)}$ need to be computed first on the interval $[t_1,t_2]$ then use these results, together with an adjustment of $\frac{dL}{d \mathbf{z}(t_1)}, \frac{dL}{dt(t_1)}$, to compute the quantities $\mathbf{a}(t),\mathbf{a}_t(t)$ on the interval $[t_0,t_1]$. Specifically, according to the code in this blog, on the time interval $[t_0,t_1]$, the initial conditions have to be $\mathbf{a}(t_1) + \frac{dL}{d \mathbf{z}(t_1)}$ and $\mathbf{a}_t(t_1)-\frac{dL}{d t(t_1)}$ . Can anyone help me understand/show mathematically why the adjustments to the gradient computation have to be done like this?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 03 Jul 2020 - 2:51

If you wish to use e.g. integral loss functions distributed on the whole domain you may want to take a look at Dissecting Neural ODEs which has code implementation here torchdyn

gradient computation for neural ODEs

There are 1 best solutions below

Related Questions in CALCULUS

Related Questions in ORDINARY-DIFFERENTIAL-EQUATIONS

Related Questions in NUMERICAL-METHODS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions