I am currently working through Bernt Oksendals SDE's and I am having some trouble fully understanding the definition of the Ito integral (chapter 3).
My question is this...
Is there an intuitive explanation as too why in the definition of the Ito integral we require the 'elementary' functions to converge in the L2-space?
It would be great to have a.s. pathwise convergence, but the Ito construction does not provide it for random integrands even when they are non-anticipating. A weaker notion that we might hope for is pathwise convergence in probability. We do get that from the Ito construction. If we are more optimistic, we might hope for pathwise convergence in $L^p$. We do get that for $p=2$ and thus for all $p \leq 2$. This is useful because it means that we can approximate expectations and variances of solutions to an SDE like
$$dX_t=b(X_t) dt + \sigma(X_t) dW_t$$
by computing corresponding expectations and variances of a Markov chain whose increment at a point $x$ is $N(b(x) \Delta t,\sigma(x)^2 \Delta t)$ distributed, for small positive $\Delta t$. (In the vector case, replace $\sigma(x)^2$ by $\sigma(x) \sigma(x)^T$.) If we did not have this convergence in $L^p$ but we did have convergence in measure, that would mean that although most of our samples would behave nicely for small enough $\Delta t$, they would occasionally have large enough errors that that the expectation and variance results would be spoiled by these errors.
There are also some more abstract concepts floating around here relating to it being $L^2$ specifically. For example, because we have convergence in $L^2$, we can make sense of the power spectrum and we can prove the Ito isometry.