Consider a stochastic differential equation evolving on $\mathbb R$
\begin{equation} dx_t = f(x_t)dt + c dw_t ,\quad x_0 = y \in \mathbb R \end{equation}
where $f: \mathbb R \to \mathbb R, c \in \mathbb R$ and $w_t$ is an $1$-dimensional Wiener process/Brownian motion.
In [eq. 8.6, 1] Dürr and Bach showed that when $f\in C^2(\mathbb R)$ the most likely trajectory taken by $x_t$ on the time interval $[0,T]$ is the solution to
\begin{equation} \ddot z_t = f(z_t) f'(z_t) + \frac {c^2} 2 f''(z_t), \quad z_0 = y, \quad \dot z_T = f(z_T) \end{equation}
Note: In [1] the most likely trajectory is defined as the differentiable path starting at $y$ whose surrounding tube of radius $\epsilon$ in the uniform norm has maximal probability under $x_t$. They show that this is well-defined provided that $\epsilon$ is below some threshold.
I am struggling to get an intuition for this result. Intuitively, one can interpret the SDE as repeatedly adding Gaussian noise to the ODE $dx_t = f(x_t)dt$. (For instance, this is what happens as one integrates the SDE using the Euler-Maruyama scheme). This leads one to think that the most likely path would be given by $\dot z_t = f(z_t), z_0=y$. (I know that this mental picture is inaccurate since it ignores the contribution of the noise to the derivative of the process and hence its influence on the motion.)
Question: Is there some intuition as to why the expression for the most likely path of the process makes sense?
Bonus question: How about if instead of considering Brownian noise, we considered the same SDE driven by smooth noise (e.g. Brownian motion convolved with a mollifier), would the most likely path still be very different from $\dot z_t = f(z_t)$? I am guessing that, in virtue of the Wong-Zakai theorems, the situation will be analogous to the Brownian case.
[1] Dürr, Bach (1978). The Onsager-Machlup function as Lagrangian for the most probable path of a diffusion process. Communications in Mathematical Physics.
Addendum: According to the Stratonovich path integral formalism as presented in [2], the action (ie negative log probability) of a path $x= \left\{x_t, t\in [0,T]\right\}$ is (up to an additive constant):
\begin{equation} -\log p(x \mid x_0=y ) =\int_0^T \frac 1 {2c^2}(\dot{x}_t-f(x_t))^2 +f'(x_t) d t \end{equation}
If $f'$ is a constant then we can see that the most likely path, ie the path of least action, is simply $\dot z_t = f(z_t), z_0=y$; this is also what we find by solving the equation of Dürr and Bach in this case. In the non-linear case, the non-constant derivative term $f'$ leads to the more complex expression for the most likely path. In particular, if we apply the Euler-Lagrange equations to the action, we recover the second-order ODE of Dürr and Bach (this is how they obtain the most likely path in their paper). Interestingly, we can see that in the limit of small $c$, the contribution of $f'$ to the action will be small wrt $(\dot{x}_t-f(x_t))^2$ and so the most likely path will tend to $\dot z_t = f(z_t), z_0=y$.
[2] Seifert (2012). Stochastic thermodynamics, fluctuation theorems and molecular machines. Reports on Progress in Physics.