When is a stochastic process a diffusion process?
Wikipedia says "A diffusion process is a Markov process with continuous sample paths for which the Kolmogorov forward equation is the Fokker–Planck equation." I don't know how to interpret this; doesn't this just mean it has the typical diffusion form \begin{equation}dX_t = \mu(X_t) dt + \sigma(X_t) dW_t, \;\; X_0 = x_0\end{equation} and how would one check this for an arbitrary process?
I understand this means there's an average drift and a variance around that drift. But I don't know if this form is valid for an arbitrary process $X_t$, since we can always define the "drift" as $ \mu(X_t) = \mathbb{E}[X_t]$ and variance $\sigma(X_t) = \text{Var}(X_t)$.
For example, define a process $X_t$ that increases by $dX$ with probability $p \in [0,1]$ in each time interval $dt$. This resembles a Poisson process with rate $p$ and infinitesimal jump size. Since the mean and variance of Poi$(p)$ are both $p$, then I guess $\mu(X_t)dt = p dt$ and $\sigma^2 = \sqrt{p}$. But I still don't know if the Kolmogorov forward equation is the Fokker-Planck.
I did some reading and here is a partial answer:
Definition: Diffusion process (see here)
A Markov process $X_t$ with transition probability $P(\Gamma, t|x,s)$ is a diffusion process if
for every $x$ and every $\epsilon>0$, $$\int_{|x-y|>\epsilon} P(dy,t|x,s) = o(t-s).$$ I.e. the probability of moving more than $\epsilon$ in a small time frame is small.
the drift is given by $a(x,s)$ such that for every $x$, and every $\epsilon >0$, $$ \int_{|y-x|\leq \epsilon} (y-x) P(dy,t|x,s) = a(x,s)(s-t) + o(s-t)$$
Diffusion coefficient is given by $b(x,s)$ such that $$\int_{|y-x|\leq \epsilon} (y-x)^2 P(dy,t|x,s) = b(x,s)(s-t) + p(s-t).$$
I.e. the drift and diffusion coefficients specify the distribution of the movement, giving a necessary concentration of measure.
Characterization by Fokker-Planck equation In my question I said that a diffusion process is a Markov process whose forward equation is the Fokker-Planck equation. How can this be reconciled with the above definition?
This says that if we define a pdf for the transition density, $$ P(\Gamma,t|x,s) = \int_\Gamma p(s,x,t,y)dy$$ and that assume $a, b$ are differentiable, and assume 1., 2., 3. above, then the transition probability density satisfies the forward Kolmogorov equation.
So in addition note that the equation $dX_t = a(X,t) dt + b(X_t, t) dB_t, \;\; X_0 = x_0$, does not specify a diffusion process unless the coefficients are continuous.
Something I am still confused about: I saw somewhere else that the drift is given by $$\mathbb{E}[X_{t+dt}-X_t| X_t = x] = a(x,t)dt + O(dt^2)$$ I'm not sure how this is equivalent to the integral in 2) above, and why there is a $O(dt^2)$ term (since the expectation should be fixed).