Assume $u_t$ is some time-homogenous Markov-Chain with unique stationary distribution $\pi$.
Consider iterations of the form
$$x_{t} = f(y_t,u_t)$$ $$y_{t+1} = y_t + \varepsilon_t \nabla_y g(v,w,y)\vert_{v = x_{t+1},w=x_t,y=y_t},$$
where $g$ is some function.
Is there any theory on how to check the convergence of this kind of update rule? There are some similarities to stochastic approximation/the Robbins-Monro-algorithm, but of course there are key differences:
One has the dependence of $x_t$ on both $y_t$ and $u_t$, such that $x_t$ itself is not time homogenoeus.
The updates depend on $x_{t+1},x_t$ (in itself I guess this might not be such a problem, at least for the $u_t$ the process $(u_{t+1},u_t)$ would still have the same properties like time-homogeneity)
One ignores the dependencies of $x_t$ on $y_t$ by only taking the partial gradient of $g$
My question is now if despite these differences,under some conditions there will still be some convergence to
$$y^* = \arg \max_y \mathbb{E}_{u_{t+1},u_t}[g(f(y^*,u_{t+1}),f(y^*,u_t),y^*)]$$
and whether there is some reference that might help to understand this setting.