It is natural to wonder about a Brownian motion with a drift toward $0$ whose rate is equal to the current value of the process. Unlike the standard Wiener process, which is null-recurrent, this is positive-recurrent and seldom wanders very far from $0.$
But the way I have seen it presented is this:
$$\tag 1 \text{Let } X(t) = e^{-t}W(e^{2t}),\qquad\qquad\qquad\qquad$$ where $W$ is the standard Wiener process. Now derive its covariance function and its drift.
Would any reasonable person think something like that is of interest other than as a routine exercise for undergraduates unless they knew the (to me) unexpected result?
What sort of thought process could possibly lead someone from wondering about the mean-reverting Brownian motion to coming up with line $(1)$ above as the answer?
The O-U process was introduced as a solution to a Langevin equation $dX_t = - X_t \, dt +dB_t$ by Ornstein & Uhlenbeck (1930), and the solution of that equation was made rigourous by Doob in 1942, at a time when Ito's work was little known (if at all) in the West.
Because the integrand in the solution $X_t=X_0e^{- t}+\int_0^t e^{-(t-s)} \, dB_s$ ($X_0$ standard normal and independent of the Brownian motion $B$) is non-random, it's clear that the process is Gaussian, with mean $0$ and covariance $\Gamma (s,t) = e^{-t}\sinh(s)$ for $0\le s\le t$. For those of us of a certain age, who learned about things like Brownian bridge theorem from Patrick Billingsley's book, it's natural to try to cook up a Gaussian process with a given covariance by looking at something like $g(t)[Z+\tilde W(h(t))]$, with $g$ and $h$ to be chosen to obtain the desired covariance. (Here $\tilde W$ is a standard Brownian motion and $Z$ is an independent standard normal.) The choices $g(t) = e^{-t}$ and $h(t) = e^{2t}-1$ yield the "out-of-left-field" construction of O-U; notice that $W(u) = Z+\tilde W(u-1)$ for $u\ge 1$. (One advantage of $t^{-t}W(e^{2t})$ is that it's defined for all real $t$ and is a stationary version of the O-U process.)
I'm not sure who came up with this recipe for O-U, but I first saw it in a paper by David Williams circa 1980, using the idea to construct the infinite-dimensional O-U, with values in Wiener space rather that Euclidean space.