Stochastic gradient descent - convergence of iterates

251 Views Asked by At

Consider a strongly convex function objective $F(x)$ that is to be minimized. Let $x^*$ be the minimizer of $F(x)$. At at any point $x$, we only have access to an oracle that generates an unbiased estimator $\hat{g}$ of the sub-gradient at $x$, i.e. $\mathop E[\hat g] \in \partial F(x)$. The variance of the estimator is bounded.

We use the standard stochastic gradient descent update rule $x_{t+1} = x_{t} - \eta_t \hat g_t$ where $\eta_t$ is the step size.

Consider the averaged iterate $\bar x = \frac{1}{T} \sum_{t=1}^T x_t$. Can we prove that for any $\delta > 0$, the following statement holds with probability at least $(1-\delta)$? $$\vert \bar x - x^* \vert \le O(\frac{1}{\sqrt T})$$

If not, what additional conditions do we need on the objective function or the oracle for this statement to hold?

1

There are 1 best solutions below

0
On

In

"Nemirovski, A., Juditsky, A., Lan, G., & Shapiro, A. (2009). Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4), 1574-1609." here's a link

they use the expected value but the same thing can be done with conditional expectations which gives you almost sure convergence, i.e. even probability $1$.