Let
- $(\Omega,\mathcal A,\operatorname P)$ be a probability space
- $\mathbb F=(\mathcal F_t)_{t\ge 0}$ be a filtration on $(\Omega,\mathcal A)$
- $B=(B_t)_{t\ge 0}$ be a Brownian motion on $(\Omega,\mathcal A,\operatorname P)$ with respect to $\mathbb F$
- $H=(H_t)_{t\ge 0}$ be $\mathbb F$-adapted and $\mathbb F$-progressively measurable with $$\operatorname E\left[\int_0^\infty H_t^2\right]<\infty$$
Under this conditions, the Itô integral of $H$ with respect to $B$ $$\int_0^\infty H_s\;dB_s$$ is well-defined. Now, let
- $b,\sigma:[0,\infty)\times\mathbb R\to\mathbb R$ be Borel measurable
and suppose we are considering a "process" $X=(X_t)_{t\ge 0}$ whose local behavior in time can be described by a differential equation $$dX_t=b\left(t,X_t\right)dt\;.\tag{1}$$ However, maybe $(1)$ is not accurate cause the process is disturbed by a random influence.
The Brownian motion at time $t$ is $\mathcal N_{0,\;t}$-distributed and the increments $B_t-B_s$ are $\mathcal N_{0,\;t-s}$-distributed. So, since the normal distribution occurs almost naturally in many practical problems, it makes sense to me, that we somehow want to integrate $X$ by $B$ and add this term to $(1)$.
Lebesgue–Stieltjes integration would make perfectly sense to me. The integration interval would be weighted by a normally distributed factor (the increments of the Brownian motion).
However, we all know that Lebesgue–Stieltjes integration with the Brownian motion as the integrator is impossible.
So, we use the Itô integral and model our problem by $$dX_t=b(t,X_t)dt+\sigma(t,X_t)dB_t\tag{2}\;.$$ While I know, that the Itô integral has beautiful properties like being a continuous $\mathbb F$-martingale, I ask myself in which terms $(2)$ is still appropriate for our problem.
Let $X$ be the solution of $(2)$. How can we motivate, that $X$ is really a solution for our problem? $X$ should be the solution of $(1)$ (which is the perfect model function without any disturbance) plus some normally distributed distortion whose intensity depends on $\sigma$.
Honestly, I don't see that $X$ has this property.
There is not one answer to this question. Your question is really one about modeling, rather than being strictly about mathematics, so the best answer depends on what you're trying to model.
Two answers that come to mind for me are as follows. One would be to instead consider applying small iid mean zero normally distributed perturbations every $\Delta t$ and sending $\Delta t \to 0$. A scaling argument shows that this will either diverge or have no effect unless the variance of the perturbations is a multiple of $\Delta t$ (maybe dependent on time, but only as a factor). This is the same scaling argument that takes place when you derive the heat equation from the simple symmetric random walk. If the variance doesn't have a time dependence, then you get a multiple of Brownian motion as $\Delta t \to 0$. If there is a time dependence but no $x$ dependence, then you get an Ito integral (or whatever other stochastic integral, it doesn't matter). Only when there is an $x$ dependence is there any real subtlety.
The other is that the property:
$$\lim_{\Delta t \to 0} \mathbb{E} \left ( \frac{X_{t+\Delta t}-X_t}{\Delta t} \mid X_t = x \right ) = b(t,x)$$
is desirable. This property essentially says "$b$ is the average drift of the process", which at least to me seems essential in order to think of $b$ as drift and $\sigma$ as diffusion. This requirement forces the noise to be a local martingale, and one can prove that no local martingale has bounded variation.
This is also the only sense in which the process is described by the unperturbed solution plus a noisy disturbance. It actually isn't that simple globally: for instance, you might have two stable equilibria and the noise can take you between them. In this case the perturbed process has qualitatively different behavior from the unperturbed process. But it is like that locally, in that the analogue of the derivative of the process is indeed $b$ at each point.