Consider a diffusion process which has a drift that pushes it towards some fixed value.
Specifically, consider the example of a MLE estimator for a Bernoulli random variable, given by $\mu_t = \frac{C_t}{t}$ where $C_t$ is the count of successes. We know $\mu_t \rightarrow p$ where $p$ is the Bernoulli parameter, i.e. $C_t \sim \text{Bernoulli}(p)$.
The drift of a process is defined as $\lim_{dt\rightarrow 0} \frac{1}{dt}\mathbb{E}[\mu_{t+dt} - \mu_t | \mu_t] $.
Then, whenever $\mu_t < p$, the drift would be positive, pushing $\mu_t$ toward $p$, and vice versa.
In the continuum limit of this process, i.e. $dC_t = \sqrt{dt}$ and $dt \rightarrow 0$, this is a diffusion process. How can we define the drift? I believe it should be piecewise constant, with value depending on whether $\mu_t > p$ or $\mu_t < p$.
So first off you must work with the Poisson process with jump rate $p$ instead of directly with $C_t$. This is a reasonable assumption for large time, by basically the same logic as the Poisson approximation to the binomial. I'll call that Poisson process $C_t$ from here out.
The main idea is that the Poisson process $C_t$ itself cannot be described by a diffusion process directly at all. One must instead work with a process with at least small drift if not zero drift, because just $C_t$ when converted to a continuum limit in a naive way would either be deterministic (because $O(t/dt)$ steps of magnitude $dt$ occur and the law of large numbers kicks in) or diverge instantly (because $O(t/dt)$ steps of magnitude $(dt)^{1/2}$ occur and they tend not to cancel out because of the drift).
In this context it makes sense to achieve the approximation by defining a time scale $T \gg 1$ and examining $c_\tau=C_{T \tau}=T \mu_\tau + T^{1/2} \sigma_\tau$ where $\mu_\tau$ is deterministic and $\sigma_\tau$ is random. Neither can depend explicitly on $T$ (if they do then our scaling was wrong).
This is a kind of analogue of van Kampen system size expansion where the large parameter is forced in by stretching out the time scale. (Note that this focus on large time is necessary, there is no way that the short time dynamics can be resolved by a diffusion process in any reasonable sense.)
In this case $\mu_\tau=p\tau$ necessarily, and then you are left to perform Taylor expansion on the master equation for the evolution of $\sigma_\tau=T^{-1/2} \left ( c_\tau-T\mu_\tau \right )=T^{-1/2} c_\tau - T^{1/2} p \tau$ in powers of the small parameter $T^{-1/2}$ in order to isolate the "microscopic drift" (if there is any) and the diffusion inside of $c_\tau$. It looks like you get that the first moment is just zero while the second moment is $T^{-1}$ times the second moment of the increment distribution of $c_\tau$ which is $pT$. So for $T \gg 1$ and $t=T\tau$ where $\tau$ is of order $1$, we have that the PDF of $\sigma_\tau$ asymptotically exists and evolves as
$$\frac{\partial f}{\partial \tau}=\frac{p}{2} \frac{\partial^2 f}{\partial x^2}.$$
Consequently the overall process $C_t$ behaves like $pt+\sqrt{p} B_t$ where $B_t$ is a Brownian motion, for large time.
Remarks:
My main source here is Gardiner Handbook of Stochastic Methods Chapter 7.