Relation between discrete time and continuous time recurrent neural network dynamical system

528 Views Asked by At

I have been fumbling over this problem for quite some time. I hope you will all understand why I post this question here. In various books and papers, there are equations that represent discrete time recurrent neural networks, and those for continuous time recurrent neural networks. The purpose is to describe the exact mathematical relation between the discrete and continuous-time systems, as I think it is important but I haven't been able to find a derivation of this yet.

From the deep learning book (eq 10.9), the equation of a discrete time RNN dynamical system of the state vairable $h(k)$ (the state at time $k$) is the following. $J$ and $B$ are weight matrices and $b$ is the bias vector. $g$ is a nonlinear function, $\tanh$ for example. $k$ represents the time index. $$h(k+1)=g(Jh(k)+Bx(k)+b)$$

Here is the equation a continuous time RNN, referenced here.

$$\tau \dot{h}(t)=-h(t) + Jg(h(t))+Bx(t)+b$$

What I am trying to do is show that these correspond in such a way that one system defines the other and vice versa. It seems to me in the dynamical system literature this equivalence is taken for granted. Usually research is concerned with either the discrete case or the continuous case, but not the correspondence between each case. It seems like this correspondence is important, because often the parameters $J$, $B$, and $b$ are inferred, or "learned", from discrete data, inherently defining a discrete dynamical system. However most analysis ( like much of the analysis by David Susillo ), is concerned with the continuous system. The parameters are learned from a discrete system but then we analyze a continuous system. I am just trying to tie together each equation mathematically so I can convince myself that this is an OK thing to do.

It seems that this connection is alluded to in the wikipedia article on RNNs, I quote:

Note that, by the Shannon sampling theorem, discrete time recurrent neural networks can be viewed as continuous-time recurrent neural networks where the differential equations have transformed into equivalent difference equations.[53] This transformation can be thought of as occurring after the post-synaptic node activation functions $y_{i}(t)$ have been low-pass filtered but prior to sampling.

I started looking into the Nyquist-Shannon Theorem, still more to learn there. However, this statement seems to be saying exactly what I am trying to do. I am unable to produce the mathematics to connect the discrete with the continuous RNN, and this is what I would like to do.