Consider the following discrete-time, finite horizon control problem with states $s \in \mathbb{R}^p$ and actions $a \in \mathbb{R}^q$ with objective: $$ \max \mathbb{E} \sum_{t=0}^{T-1} r(s_t, a_t) $$ where $r(s_t, a_t)$ is a known reward function, the state transition is given by: $s_{t+1} = f(s_t, a_t) + \omega_t$ where $f$ is known and $\omega_t$ is some additive noise, the initial state $s_0$ is fixed and $a_t = \pi_t (s_t)$.
Is there any proof of existence (under some assumptions) for the time-dependent policy $\{\pi_t\}_{t=1}^{T-1}$ where $\pi_t$ is either deterministic or a stochastic policy?
Eg. similar to the Certainty Equivalence principle for the Linear-Quadratic setting.