Convergence of $\sum_{t=1}^{\infty}E[X_{t+1}-X_{t}|\mathcal{F_{t+1}}]$

39 Views Asked by At

This arises from the proof of Stochastic (Online) Gradient Descent (SGD) convergence in L.Bottou 'Online Learning and Stochastic Approximations (1998).

In it, he shows that the continuous stochastic process $X_t$ s.t. $D_t = X_t - X_{t-1}$ such that $$ \mathbf{E}[D_t|F_t] < C_1 C_2 \alpha^2_t \ \ \ \ \ \ \ \ \ (1) $$ is convergent because $$ \sum_{t=1}^{\infty} \mathbf{E}[D_t|F_t] < C_1 C_2\sum_t\alpha^2_t< \infty $$ $\alpha_t$ is the learning rate, $X_t$ is the loss function in a neural network/MLP framework, i.e. $$ 0<X_t \overset{a.s.}{\to_t} X^{\infty}< \infty \ \ \ \text{(Equation 4.30)} $$ Which is then used to prove that the gradient of the loss function $\nabla X_t \overset{a.s.}{\to_t} 0$ (Equations 5.16-5.21), i.e. the algorithm converges to the local minimum.

My question is: what if we replace (1), for example, with $$ \mathbf{E}[X_{t+1}-X_{t}|\mathcal{F}_t] \sim N(0,1) $$ Will the convergence of $\nabla X_t$ still hold? What I understand is that
$$ \sum_{t=1}^{T}\mathbf{E}[X_{t+1}-X_{t}|\mathcal{F}_t] = 0 $$ since we do not need to worry about the conditions of Fubini-Tonelli theorem, the sum of infinite expectation is still $0$. If it is, then by the same argument as in the paper, $X_t \overset{a.s.}{\to_t} X^{\infty} < \infty$ and $\nabla X_t \overset{a.s.}{\to}0$. Is this reasoning correct?