A differentiable function is said to have the Lipschitz continuous gradient property if the following holds true for some $L>0$: $$\|\nabla f(x) - \nabla f(y)\|\leq L\|x-y\|,\, \forall x,y \in \mathbb{R}^n$$
The above property immediately results in the following:
$$f(y) \leq f(x) + \langle \nabla f(x) , y-x \rangle + \frac{L}{2}\|y-x\|^2 \quad \forall x,y$$
Using the above property if we have $x_{k+1}=x_{k}-\gamma \nabla f(x_k) $ and $\gamma <2/L$ we get the following: $$ f(x_{k+1}) + \left(\frac{1}{\gamma}-\frac{L}{2}\right)\|x_{k+1}-x_{k}\|^2 \leq f(x_{k}) $$ Assuming that $f(x)$ is bounded bellow, the above means sequence of $\{f(x_k)\}_{ k \geq 0}$ is convergent.
Now define $x_{k+1}=x_{k}-\gamma g_k $ where $\mathbb{E}[g_k]=\nabla f(x_k)$.
Question: Can we show $\{\mathbb{E}[f(x_{k}) ]\}_{ k \geq 0}$ is convergent by finding a good $\gamma$ and assuming a condition like $\mathbb{E}[\|\nabla f(x_{k})-g_k\|^2] \leq \sigma^2$ or $\mathbb{E}[\|g_k\|^2] \leq \sigma^2$ for some $\sigma > 0$?
My try: I used $\mathbb{E}[\|\nabla f(x_{k})-g_k\|^2] \leq \sigma^2$ and varying step-length.
$$ \begin{split} f(x_{k+1}) &\leq f(x_k) + \langle \nabla f(x_k) , x_{k+1}-x_k \rangle + \frac{L}{2}\|x_{k+1}-x_k\|^2\\ f(x_{k+1}) &\leq f(x_k) - \gamma_k \langle \nabla f(x_k) , g_k \rangle + \frac{L\gamma_k^2}{2}\|g_k\|^2 \\ f(x_{k+1}) &\leq f(x_k) - \gamma_k \langle g_k , g_k \rangle + \gamma_k \langle g_k - \nabla f(x_k), g_k \rangle +\frac{L\gamma_k^2}{2}\|g_k\|^2 \end{split} $$ Let $\delta_k=g_k-\nabla f(x_k)$. Then $$ \begin{split} f(x_{k+1}) & \leq f(x_k) - \gamma_k \langle g_k , g_k \rangle + \gamma_k \langle \delta_k, g_k \rangle +\frac{L\gamma_k^2}{2}\|g_k\|^2\\ \bigg(\gamma_k - \frac{L\gamma_k^2}{2}\bigg)\|g_k\|^2 & \leq f(x_k) - f(x_{k+1}) + \gamma_k \langle \delta_k, g_k - \nabla f(x_k) \rangle + \gamma_k \langle \delta_k, \nabla f(x_k) \rangle\\ \bigg(\gamma_k - \frac{L\gamma_k^2}{2}\bigg)\|g_k\|^2 & \leq f(x_k) - f(x_{k+1}) + \gamma_k \| \delta_k\|^2 + \gamma_k \langle \delta_k, \nabla f(x_k) \rangle \end{split} $$ By taking the expectation with respect to $g_k$ we have:
$$ \begin{split} \bigg(\gamma_k - \frac{L\gamma_k^2}{2}\bigg)\mathbb{E}[\|g_k\|^2 ] & \leq \mathbb{E}[f(x_k)] - \mathbb{E}[f(x_{k+1})] + \gamma_k \mathbb{E}[\| \delta_k\|^2] + \mathbb{E}[\gamma_k \langle \delta_k, \nabla f(x_k) \rangle]\\ (\gamma_k - \frac{L\gamma_k^2}{2})\mathbb{E}[\|g_k\|^2 ] & \leq \mathbb{E}[f(x_k)] - \mathbb{E}[f(x_{k+1})] + \gamma_k \sigma^2 \end{split} $$
Summing over $k$:
$$ \sum_{k=1}^N\bigg(\gamma_k - \frac{L\gamma_k^2}{2}\bigg)\mathbb{E}[\|g_k\|^2 ] \leq \mathbb{E}[f(x_1)] - \mathbb{E}[f(x_{N+1})] + \gamma_k N\sigma^2 $$
Is there any way to show that on average 2-norm of the stochastic gradient, that is, $\mathbb{E}[\|g_k\|^2 ]$, decreases?