In a paper I'm reading at the moment we're concerned with a third order nonlinear ODE for which we know the solution near thr origin look something like an upside-down parabola crossing the y axis at some large y around which there are small oscillations of a nonconstant frequency.
After a series of rescalings, the equation is a function $g$ in one variable, $z$, and is then assumed to have the expansion \begin{equation} g\sim g_0(z)+\theta g_1(z,t) \end{equation} where $g_0(z)$ is the equation of the parabola, and $t=F(Z),\,\,z=\epsilon^{\frac{1}{2}}Z$, where epsilon is a scaling related to the size of the function at $0$ and $F$ is to be determined later, by 'transforming to an equation of constant frequency'.
The equation is then recast in terms of the operator $\mathscr{L}=\frac{\partial}{\partial t}+\frac{\epsilon^{\frac{1}{2}}}{F'(Z)}\frac{\partial}{\partial z}$.
I sort of get why we're doing this - we want to pick up the behaviours over large scales and small scales simultaneously. However, I think I'm missing a lot of theory about how this works and where it comes from. Where can I find out about multiscale operators? I don't really know where to look.
What you're describing is called the multiscale method and two-timing (among other things). You can find out how it works in, e.g., M.H. Holmes' Introduction to Perturbation Methods - $\S3.1-3.2$ in my (old) edition. There's been tons of research gone into trying to justify it rigorously, btw.