I am working on designing energy-efficient neurons for neuromorphic computing. One of the critical aspects of the dynamics I'm exploring is that they should adhere to the semigroup property. Specifically, given a function $f$ with state $x$ and time $t$, it should satisfy:
$ f(x, t_1 + t_2) = f(f(x, t_1), t_2) $
This property is vital as, during training, I am updating the dynamics in equally-sized small timesteps, while during inference, the evolution of the state $x$ over many timesteps should be condensable into a single, cheap function evaluation. The semigroup property ensures consistent behavior between these stages.
It's worth noting that my primary concern is not biological plausibility. Instead, I aim to discover and design energy-efficient neural models. To this end:
I am interested in classes of functions (potentially vector-valued) that inherently satisfy the semigroup property, which I can explore through grid search or genetic algorithms to find well-performing candidates.
I am aware of matrix exponentiation, which offers a class of functions/differential equations that inherently possess the semigroup property. However, I am curious if there are additional classes of functions or mathematical models that exhibit this property.
Are there methods or techniques to construct functions (or systems of functions for multi-dimensional states) that adhere to this property?
Given the context of energy-efficient neural dynamics, are there any known mathematical constructs or approximations that might be suitable?
Any insights, references, or pointers on this topic would be highly appreciated.
I will assume that the state space is modeled by the $d$-dimensional Euclidean space $\mathbb{R}^d$.
Now let $f : \mathbb{R}^d \times [0, \infty) \to \mathbb{R}^d$ be a $C^1$ function such that
$$ f(x, 0) = x, \qquad \frac{\partial f}{\partial t}(x, 0) = v(x), \qquad f(x, t+s) = f(f(x, t), s) $$
for some $v : \mathbb{R}^d \to \mathbb{R}^d$ and for all $x \in \mathbb{R}^d$ and $s, t \geq 0$. (The condition $f(x, 0) = x$ is a natural choice in light of the semigroup property.)
Then it follows that
\begin{align*} \frac{\partial}{\partial t} f(x, t) &= \frac{\partial}{\partial s}\biggr|_{s=0} f(x, t+s) = \frac{\partial}{\partial s}\biggr|_{s=0} f(f(x, s), t) \\ &= \biggl( \frac{\partial f}{\partial x}\biggr|_{(x, t)} \biggr) \biggl( \frac{\partial f}{\partial t}\biggr|_{(x, 0)}\biggr), \end{align*}
where $f$ and $\frac{\partial f}{\partial t}$ are regarded as column vectors and $\frac{\partial f}{\partial x} = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \cdots & \frac{\partial f}{\partial x_d} \end{bmatrix}$ is regarded as a matrix. Simplifying the above result, it follows that $f$ solves the equation
$$ \frac{\partial f}{\partial t} = \frac{\partial f}{\partial x}v(x). \tag{1} $$
This equation can be tackled by the method of characteristics. Let $x : [0, \infty) \to \mathbb{R}^d$ be a solution of the equation
$$ x'(t) = -v(x(t)) \tag{2} $$
Then by $(1)$ and $(2)$, we get
$$ \frac{\mathrm{d}}{\mathrm{d} t} f(x(t), t) = \frac{\partial f}{\partial x}x'(t) + \frac{\partial f}{\partial t} = 0. $$
Hence, $t \mapsto f(x(t), t)$ is a constant function satisfying
$$ f(x(t), t) = f(x(0), 0) = x(0). $$
Let $y(s) = x(t-s)$ be the time reversal of $x(\cdot)$. Then $(2)$ and the above formula yields:
The observation can be applied in reverse as well. Specifically, if we start with a continuous function $v : \mathbb{R}^d \to \mathbb{R}^d$, then the function $f(x, t)$ defined by $(3)$ satisfies the semigroup property. Hence, identifying functions with semigroup property is essentially the same as solving the autonomous equation $y'(s) = v(y(s))$.
Here are some thoughts:
I notice a resemblance between $(3)$ and the flow-based generative model. However, my expertise in this area is limited, so I can't conclusively determine whether they are truly related.
From a computational standpoint, it's unclear whether this offers any benefits. Many numerical integrators for ODEs, like the Euler method or Runge-Kutta methods, construct solutions incrementally, which necessitates an iterative process spanning numerous timesteps. However, if there's an efficient method to train and compute $v$ (including its derivatives when needed), then utilizing $(3)$ could potentially be efficient as well.