Discrete-time dynamics can be modelled by the stochastic differential equation

143 Views Asked by Bumbble Comm At 31 Mar 2026 - 4:13

I'm reading the following paper, in which the stochastic gradient descent consists of performing partial computations of the form $$x^{k+1}=x^k-\eta_k\nabla f_{i_k}(x^{k-1})\qquad\mbox{ where }\eta_k>0,k\in\mathbb{N} \qquad(*)$$ where $i_k$ is sampled uniformly at random from the set $\{1,...,N\}$, at each iteration, starting from a random initial condition $x^0$ . The stochastic nature of SGD arises from the approximation of the gradient using only a subset of data points.

My question is, why the discrete-time dynamics in (*) can then be modelled by the stochastic differential equation (SGD) $$dx(t)=-\nabla f(x(t))\,dt+\beta^{-1/2}\,dW(t)$$

Moroever, the generator, $\mathcal{L}$, corresponding to (SGD) is defined for smooth functions $\phi$ as, $$\mathcal{L}\phi=-\nabla f\cdot\nabla\phi+\frac{\beta^{-1}}{2}\Delta\phi$$

Sorry if the question is too basic, but I'm new with this topic. Thanks!

Original Q&A

There are 1 best solutions below

Bumbble Comm On 26 Jul 2021 - 12:31

Regarding the first question, I suppose the idea is to write the equation as $$x^{k+1}-x^{k} = -\eta_k\nabla f_{i_k}(x^{k})$$ (note that I wrote $f_{i_k}(x^{k})$ and not $f_{i_k}(x^{k-1})$ which I believe is a typo). and approximate $x^{k+1}-x^{k}\sim dx$ , and approximating the stochastic gradient term and treating it to be the same as the "true" gradient plus a noise term: $ \nabla f_{i_k}(x^{k})\sim \nabla f(x^{k})+\text{gaussian noise}$

As for the second question, I think this is the Van Kampen expansion.

Discrete-time dynamics can be modelled by the stochastic differential equation

There are 1 best solutions below

Related Questions in STOCHASTIC-PROCESSES

Related Questions in STOCHASTIC-CALCULUS

Related Questions in STOCHASTIC-ANALYSIS

Related Questions in STOCHASTIC-DIFFERENTIAL-EQUATIONS

Trending Questions

Popular # Hahtags

Popular Questions