Number of calls to oracle in Randomized Stochastic Gradient (RSG) algorithm

Question

Number of calls to oracle in Randomized Stochastic Gradient (RSG) algorithm

67 Views Asked by user494522 At 27 Mar 2026 - 9:17

In Stochastic First-and Zeroth-order Methods for Nonconvex, on page 10, the author claims that the number of calls to Stochastic First order Oracle (SFO) for finding an $(\epsilon, \Lambda)$ solution after disregarding a few constant factors, can be bounded by $$ \mathcal{O}(\frac{1}{\Lambda \epsilon} + \frac{\sigma^2}{\Lambda^2 \epsilon^2})\tag{1} $$ which apparently is a consequence of $(2.19)$. However, I have the following questions:

1- What is the relation between $\lambda$ and $\Lambda$, because form Markov inequity and $(2.14)$ we have $$ \text{Prob}\{ \| f(x_R) \|^2 \geq \lambda L \mathcal{B}_N \} \leq \frac{1}{\lambda}\tag{2.19} $$ So $$ 1 - \text{Prob}\{ \| f(x_R) \|^2 < \lambda L \mathcal{B}_N \} \leq \frac{1}{\lambda} $$ Since $\| f(x_R) \|^2$ is a continuous random variable, we have equality in the $\text{Prob}$ and $$ \text{Prob}\{ \| f(x_R) \|^2 \leq \lambda L \mathcal{B}_N \} \geq 1- \frac{1}{\lambda} $$ Is $\Lambda = \frac{1}{\lambda}$?

2- Is the number of calls the same as expected value of $R$? If so, by choosing constant stepsize pmf of $R$ is uniform and it becomes $N/2$, then how we are getting bound $(1)$?

3- From which formula $(1)$ is derived and what are the steps?

Note: Please answer all the questions I have not partially. Also, please help me to walk through all the steps deriving $(1)$?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

That is a very interesting paper. I will provide a "brief" derivation but the steps in between can be derived rather easily.

1,2,3 - By Markov's bound, you have

$\text { Prob }\left\{\|\nabla f(\bar{x})\|^{2} \leq \epsilon\right\} \geq 1-\Lambda,$

or in other words, by looking at the "complement" of that equation, you get

$\text { Prob }\left\{\|\nabla f(\bar{x})\|^{2} \geq \epsilon\right\} \geq \Lambda.$

Now all that is left to do is to compare the above equation with equation (2.19) in the paper. Therefore, you get

$\epsilon = \lambda L \mathcal{B}_{N}$ and $\Lambda = \frac {1}{\lambda}.$

Once you have these equations you are basically there. Now use the above two equations in combination with (2.14) in your paper to write

$L\left(\frac{LD_{f}^{2}}{N}+\left(\tilde{D}+\frac{D_{f}^{2}}{\tilde{D}}\right)\frac{\sigma}{\sqrt{N}}\right)=\epsilon\Lambda.$

Define constants $C_{1}=D_{f}^{2}$ and $C_{2}:=\tilde{D}+\frac{D_{f}^{2}}{\tilde{D}}$ and rewrite the above equation to get

$C_{1}\frac{L}{N}+C_{2}\frac{\sigma}{\sqrt{N}}=\frac{\epsilon\Lambda}{L}.$

This equation is a quadratic in terms of $N$. Solve the quadratic for $N$ and you will get

$N \sim \mathcal{O}\left\{\frac{1}{\Lambda \epsilon}+\frac{\sigma^{2}}{\Lambda^{2} \epsilon^{2}}\right\},$

which is what you are looking for. Good luck!

Number of calls to oracle in Randomized Stochastic Gradient (RSG) algorithm

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in OPTIMIZATION

Related Questions in NON-CONVEX-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions