I am trying to understand the original Chen et al., Neural ODE paper, but I am finding the language a bit confusing.
The main question is how many layers does one have in a Neural ODE? It seems the layers are residual network blocks. The analogy that the authors use is training a continuous-time or continuous-dept neural network.
The paper itself states:
The main technical difficulty in training continuous-depth networks is performing reverse-mode differentiation (also known as backpropagation) through the ODE solver
But of course we cannot have a continuous-depth network, since a continuous-depth network has infinite layers. A computer cannot handle an object like that. So the key question is how many layers does a neural ode have? And as a follow on, how is that number of layers determined?
Any clarification would be appreciated.