Suppose we have an stochastic differential equation given by,
$$\mathrm{d}X = N(X,t)\,\mathrm{d}t + M(X,t)\,\mathrm{d}B,$$
where $B$ is a brownian motion. As far as I understand we can think of this as,
$$\frac{\mathrm{d}X}{\mathrm{d}t} = N(X,t) + M(X,t)e(t)$$
for a noise sequence $e$.
If we have a standard ordinary differential equation then the slope of the function at some time is obtained by just plugging in the time. But here is looks like this things are a little more tricky.
It looks like we dont know the distribution of say $X^{'}(t_{0})$ until $t_{0}$ as it is dependant on $X(t_{0})$ which value is in turn related to the $X^{'}(t_{0}-\Delta)$ and so on.
If I got this right we have to think about the distribution of $X^{'}(t_{0})$ not only as random in the sense of a usual random variable but also that it depends on all the previous outcomes. So its distribution could be thought of as a composition of all the previous $X^{'}(t)$'s distributions.
Does this kind of reasoning make sense?
I'm not sure it's reasonable to talk about $X'$ since $X$ will not be differentiable in the calculus sense. Sample paths of $X$ will not have a slope. I think the most intuitive way to think of this equation is via a discretization and then simuation.
$X(t + \Delta t) - X(t) = N(X(t), t)\Delta t + M(X(t), t) \times \text{generate_normal}(\mu = 0, \sigma = 1) \sqrt{\Delta t}$
Note the $\sqrt{\Delta t}$ on the stochastic portion is essential, and also prevents division of this equation by $\Delta t$, as you would like to do if you wanted a slope.
Now given a deterministic implementation of $N(a,b)$ and $M(a,b)$ and some reasonable choice like $\Delta t = 0.001$, you can actually simulate sample paths of $X$. For example, defining $N = 0$ and $M = 1$ will make your program simulate Brownian motion.
And it's true that future values of $X$ will depend on past values. Once you simulate many sample paths, you can do stuff like take the average of the sample paths at time $T$.