I'm reading the following paper, in which the stochastic gradient descent consists of performing partial computations of the form $$x^{k+1}=x^k-\eta_k\nabla f_{i_k}(x^{k-1})\qquad\mbox{ where }\eta_k>0,k\in\mathbb{N} \qquad(*)$$ where $i_k$ is sampled uniformly at random from the set $\{1,...,N\}$, at each iteration, starting from a random initial condition $x^0$ . The stochastic nature of SGD arises from the approximation of the gradient using only a subset of data points.
My question is, why the discrete-time dynamics in (*) can then be modelled by the stochastic differential equation (SGD) $$dx(t)=-\nabla f(x(t))\,dt+\beta^{-1/2}\,dW(t)$$
Moroever, the generator, $\mathcal{L}$, corresponding to (SGD) is defined for smooth functions $\phi$ as, $$\mathcal{L}\phi=-\nabla f\cdot\nabla\phi+\frac{\beta^{-1}}{2}\Delta\phi$$
Sorry if the question is too basic, but I'm new with this topic. Thanks!
Regarding the first question, I suppose the idea is to write the equation as $$x^{k+1}-x^{k} = -\eta_k\nabla f_{i_k}(x^{k})$$ (note that I wrote $f_{i_k}(x^{k})$ and not $f_{i_k}(x^{k-1})$ which I believe is a typo). and approximate $x^{k+1}-x^{k}\sim dx$ , and approximating the stochastic gradient term and treating it to be the same as the "true" gradient plus a noise term: $ \nabla f_{i_k}(x^{k})\sim \nabla f(x^{k})+\text{gaussian noise}$
As for the second question, I think this is the Van Kampen expansion.