I'm reading about the method of two-timing in section 7.6 of Nonlinear Dynamics and Chaos by Strogatz, and I have some questions about how to make this concept rigorous. In this section the book considers equations of the form $$ \hspace{4.5cm} \ddot{x} + x + \epsilon h(x,\dot{x}) = 0 \hspace{4.5cm} (1) $$ where $0 \leq \epsilon \ll 1$, $x: \mathbb{R} \to \mathbb{R}$, and $h: \mathbb{R}^2 \to \mathbb{R}$ is an arbitrary smooth function. The section first covers regular perturbation theory, which I feel okay with. But then author gets to the method of two-timing for ODEs that exhibit multiple time scales, and this is where the explanation starts to feel a little hand-wavey. Here is Strogatz's description of the method:
To apply two-timing to (1), let $\tau = t$ denote the fast $O(1)$ time, and let $T = \epsilon t$ denote the slow time. We'll treat these two times as if they were independent variables. In particular, functions of the slow time $T$ will be regarded as constants on the fast time scale $\tau$. It's hard to justify this idea rigorously, but it works. [Arg! Show me the rigor!]...Next we turn to the mechanics of the method. We expand the solution of (1) as a series $$ \hspace{3cm} x(t,\epsilon) = x_0(\tau, T) + \epsilon x_1(\tau, T) + O(\epsilon^2). \hspace{3cm} (2) $$ The time derivatives in (1) are transformed according to the chain rule: \begin{align*} \hspace{3cm} \dot{x} = \frac{dx}{dt} = \frac{\partial x}{\partial \tau} + \frac{\partial x}{\partial T} \frac{\partial T}{\partial t} = \frac{\partial x}{\partial \tau} + \epsilon \frac{\partial x}{\partial T} \hspace{3cm} (3) \end{align*} A subscript notation for differentiation is more compact; thus we write (3) as $$ \hspace{5.1cm} \dot{x} = \partial_{\tau} x + \epsilon \partial_T x. \hspace{5.1cm} (4) $$
After substituting (2) into (4) and collecting powers of $\epsilon$, we find $$ \hspace{3.3cm} \dot{x} = \partial_{\tau} x_0 + \epsilon (\partial_T x_0 + \partial_{\tau} x_1) + O(\epsilon^2). \hspace{3.3cm} (5) $$ Similarly, $$ \hspace{3cm} \ddot{x} = \partial_{\tau \tau} x_0 + \epsilon(\partial_{\tau \tau} x_1 + 2 \partial_{T \tau} x_0) + O(\epsilon^2). \hspace{3cm} (6) $$
My main question: How can we make the above ideas rigorous?
My attempt is as follows: Given the ODE (1), we first assume that the solution $x(t;\epsilon)$ can be expressed as $$ x(t;\epsilon) = X(t,\epsilon t)$$
for some function $X: \mathbb{R}^2 \to \mathbb{R}$. Now let $\tau, T: \mathbb{R} \to \mathbb{R}$ be functions defined by $\tau(t) := t$ and $T(t;\epsilon) := \epsilon t$ for all $t \in \mathbb{R}, \epsilon > 0$. We then have $X(\tau(t),T(t)) = x(t;\epsilon)$ for all $t \in \mathbb{R}, \epsilon > 0$. The chain rule computation is then straightforward:
\begin{align*}
\dot{X} := \frac{dX}{dt} &= \frac{\partial X}{\partial \tau} \frac{d\tau}{dt} + \frac{\partial X}{\partial T} \frac{dT}{dt} = \frac{\partial X}{\partial \tau} + \frac{\partial X}{\partial T} \epsilon.
\end{align*}
The subsequent equations (4)-(6) should now be the same, just with $x$ replaced by $X$. (Is my interpretation correct so far?)
The Taylor expansion part is where I'm a bit perplexed: It seems that (2) is a Taylor expansion of the function $\epsilon \mapsto x(t; \epsilon)$, or equivalently, $\epsilon \mapsto X(\tau(t), T(t))$. Now if we had some arbitrary smooth function $f: \mathbb{R}^2 \to \mathbb{R}$ indexed by some parameter $\epsilon$, then I agree that the map $\epsilon \mapsto f(x,y; \epsilon)$ has some expansion $$ f(x,y; \epsilon) = f_0(x,y) + f_1(x,y) \epsilon + f_2(x,y) \epsilon^2 + \cdots, $$
(provided that it is smooth), where the coefficient functions $f_0, f_1,\ldots$ do not depend on $\epsilon$. But my problem with (2) is that the "coefficients" of the Taylor series, namely $x_0(\tau,T), x_1(\tau,T),\ldots$ are functions of $\epsilon$, since $T$ depends on $\epsilon$. How then is this a legitimate Taylor expansion?
Also, an additional question: Is there a way to know a prior if the solution $x = x(t;\epsilon)$ to some ODE $\dot{x} = f(x)$ can be expressed in the form $x(t;\epsilon) = X(t, \epsilon t)$? Or do we generally make such an assumption based on intuition? In one of the examples that Strogatz uses, the ODE has the analytic solution $x(t) = (1-\epsilon^2)^{-1/2} e^{-\epsilon t} \cos((1-\epsilon^2)^{1/2} t)$, so clearly $X(t_1, t_2) := (1-\epsilon^2)^{-1/2} e^{-t_2} \cos((1-\epsilon^2)^{1/2} t_1)$ satisfies the requirement. But is there a way to see this a priori (i.e., if we couldn't analytically solve the ODE)?
Any insights would be greatly appreciated. This two-timing concept is rather confusing to me, and I think seeing the details spelled out very rigorously would help my understanding.
So firstly, they really should be writing $X(t, \epsilon t; \epsilon)$. Without that the expansion in terms of $x_i(\tau, T)$ doesn't quite make sense. But as you note, it does make sense that for any function $X(\tau, T; \epsilon)$, there exists some sequence of $x_i$ such that $$ X(\tau, T;\epsilon) = x_0(\tau, T) + \epsilon x_1(\tau, T) + \epsilon^2x_2(\tau, T) + ... $$
So here's what we're going to do with this. We substitute this series into the differential equation using $x(t;\epsilon) = X(t, \epsilon t;\epsilon)$, then collect orders of $\epsilon$. This will give some series of relations among the $x_i$ and their $\tau$ and $T$ derivatives when evaluated at $\tau = t$, $T = \epsilon t$. Then we add the additional requirement that the $x_i$ and their derivatives satisfy those relations for every $\tau, T$, which we can do because any such set of $x_i$ will of course satisfy the relations at $\tau = t, T = \epsilon t$. And now we have a set of simpler differential equations entirely in terms of $\tau$ and $T$, with no $\epsilon$s or $t$s around. If things have gone right, these can then be solved recursively order-by-order.