I am working on a problem where I wish to "fit" one stochastic process model, defined by an SDE which we can adjust via parameters (a Neural SDE) to a target process, which we assume can be defined by another SDE. Call the target process $P_t$ and the Neural SDE by $S^\theta_t$. I have come up with a way to fit $S^\theta_t$ to the target process using the Wasserstein distance, so that after fitting, I can say that $S^\theta_t$ is equal to $P_t$ is the sense of law: the distributions produced by both processes are exactly the same at all time steps, given the same starting point and time. This is to say:
$$\forall t>0,\ \bigg| \mathbb{E}[f(S^\theta_t)] - \mathbb{E}[f(P_t)] \bigg| = 0$$
However, what I am really after is to match the pathwise unique solution of $P_t$ with $S^\theta_t$. I think this might be very subtle, but I can be sure that so far I cannot conclude that I've approximated the unique pathwise solution. As example are the two processes $\max(B_t)$ and $\text{abs}(B_t)$ produce the same law but are different path-wise processes.
I'm looking for some additional (light) assumptions or constraints that we can apply to our criterion for fitting the target process that could ensure that we are in fact fitting the path-wise solution to the target SDE. I am attracted to the classic uniqueness Theorem from Oksendal, attached. Would it be enough to enforce that the drift and diffusion terms of the both processes adhere to Lipshitz conditions as described in the Theorem?
