This arises from the proof of Stochastic (Online) Gradient Descent (SGD) convergence in L.Bottou 'Online Learning and Stochastic Approximations (1998).
In it, he shows that the continuous stochastic process $X_t$ s.t. $D_t = X_t - X_{t-1}$ such that $$ \mathbf{E}[D_t|F_t] < C_1 C_2 \alpha^2_t \ \ \ \ \ \ \ \ \ (1) $$ is convergent because $$ \sum_{t=1}^{\infty} \mathbf{E}[D_t|F_t] < C_1 C_2\sum_t\alpha^2_t< \infty $$ $\alpha_t$ is the learning rate, $X_t$ is the loss function in a neural network/MLP framework, i.e. $$ 0<X_t \overset{a.s.}{\to_t} X^{\infty}< \infty \ \ \ \text{(Equation 4.30)} $$ Which is then used to prove that the gradient of the loss function $\nabla X_t \overset{a.s.}{\to_t} 0$ (Equations 5.16-5.21), i.e. the algorithm converges to the local minimum.
My question is: what if we replace (1), for example, with
$$
\mathbf{E}[X_{t+1}-X_{t}|\mathcal{F}_t] \sim N(0,1)
$$
Will the convergence of $\nabla X_t$ still hold? What I understand is that
$$
\sum_{t=1}^{T}\mathbf{E}[X_{t+1}-X_{t}|\mathcal{F}_t] = 0
$$
since we do not need to worry about the conditions of Fubini-Tonelli theorem, the sum of infinite expectation is still $0$. If it is, then by the same argument as in the paper, $X_t \overset{a.s.}{\to_t} X^{\infty} < \infty$ and $\nabla X_t \overset{a.s.}{\to}0$. Is this reasoning correct?