Discrete-time dynamics can be modelled by the stochastic differential equation

143 Views Asked by At

I'm reading the following paper, in which the stochastic gradient descent consists of performing partial computations of the form $$x^{k+1}=x^k-\eta_k\nabla f_{i_k}(x^{k-1})\qquad\mbox{ where }\eta_k>0,k\in\mathbb{N} \qquad(*)$$ where $i_k$ is sampled uniformly at random from the set $\{1,...,N\}$, at each iteration, starting from a random initial condition $x^0$ . The stochastic nature of SGD arises from the approximation of the gradient using only a subset of data points.

My question is, why the discrete-time dynamics in (*) can then be modelled by the stochastic differential equation (SGD) $$dx(t)=-\nabla f(x(t))\,dt+\beta^{-1/2}\,dW(t)$$

Moroever, the generator, $\mathcal{L}$, corresponding to (SGD) is defined for smooth functions $\phi$ as, $$\mathcal{L}\phi=-\nabla f\cdot\nabla\phi+\frac{\beta^{-1}}{2}\Delta\phi$$

Sorry if the question is too basic, but I'm new with this topic. Thanks!

1

There are 1 best solutions below

0
On

Regarding the first question, I suppose the idea is to write the equation as $$x^{k+1}-x^{k} = -\eta_k\nabla f_{i_k}(x^{k})$$ (note that I wrote $f_{i_k}(x^{k})$ and not $f_{i_k}(x^{k-1})$ which I believe is a typo). and approximate $x^{k+1}-x^{k}\sim dx$ , and approximating the stochastic gradient term and treating it to be the same as the "true" gradient plus a noise term: $ \nabla f_{i_k}(x^{k})\sim \nabla f(x^{k})+\text{gaussian noise}$

As for the second question, I think this is the Van Kampen expansion.