Approximate gradient with a single function evaluation

73 Views Asked by Bumbble Comm At 29 Mar 2026 - 8:13

This question is about Lemma 1 of Online convex optimization in the bandit setting: gradient descent without a gradient. This lemma can be stated as:

Let $a\mathbb{B} := \{ x \in \mathbb{R}^d \ | \ \|x\|\leq a \}$ and $a\mathbb{S} := \{ x \in \mathbb{R}^d \ | \ \|x\|=a \}$. Then, for some constant $\delta>0$, $$ \mathbf{E}_{u\in 1\mathbb{S}}[f(x+\delta u)u] = \frac{\delta}{d} \nabla (\mathbf{E}_{v\in 1\mathbb{B}}[f(x+\delta v)]). $$

First, for the $d=1$, I understand that we have $$ \mathbf{E}_{u\in 1\mathbb{S}}[f(x+\delta u)u] = \frac{1}{2}f(x+\delta) - \frac{1}{2}f(x-\delta) $$ and $$ \mathbf{E}_{v\in 1\mathbb{B}}[f(x+\delta v)] = \int_{-1}^1 f(x+\delta v) \frac{1}{2} \ dv = \frac{1}{2\delta} \int_{-\delta}^\delta f(x+w) \ dw = \frac{1}{2\delta} F(x+\delta) - \frac{1}{2\delta} F(x-\delta), $$ where $\frac{d}{dy} F(y) := f(y)$. However, in the paper they claim that $\frac{d}{dx} F(x+\delta) = f(x+\delta)$, which let us write $$ \mathbf{E}_{u\in 1\mathbb{S}}[f(x+\delta u)u] = \frac{1}{2}f(x+\delta) - \frac{1}{2}f(x-\delta) = \frac{d}{dx} \left(\frac{1}{2}F(x+\delta) - \frac{1}{2}F(x-\delta) \right) = \delta \mathbf{E}_{v\in 1\mathbb{B}}[f(x+\delta v)]. $$

The first part of the question is: how can we conclude that $\frac{d}{dy} F(y) = f(y)$ implies $\frac{d}{dx} F(x+\delta) = f(x+\delta)$? Is is because $\delta$ is a constant? Or am I missing something?

For the general $d$-dimensional case, the authors state that $$ \mathbf{E}_{v\in 1\mathbb{B}}[f(x+\delta v)] = \frac{\int_{\delta \mathbb{B}} f(x+v) \ dv}{\text{vol}_d (\delta \mathbb{B})} $$ and $$ \mathbf{E}_{u\in 1\mathbb{S}}[f(x+\delta u)u] = \frac{\int_{\delta \mathbb{S}} f(x+u) \frac{u}{\|u\|} \ du}{\text{vol}_{d-1} (\delta \mathbb{S})}. \tag{1} $$ Then, from Stoke's theorem, $$ \nabla \int_{\delta \mathbb{B}} f(x+v) \ dv = \int_{\delta \mathbb{S}} f(x+u) \frac{u}{\|u\|} \ du. \tag{2} $$ Combining these equations, we get $$ \text{vol}_d (\delta \mathbb{B}) \nabla \mathbf{E}_{v\in 1\mathbb{B}}[f(x+\delta v)] = \text{vol}_{d-1} (\delta \mathbb{S}) \mathbf{E}_{u\in 1\mathbb{S}}[f(x+\delta u)u]. $$ The lemma then follows from the fact that $\frac{\text{vol}_d (\delta \mathbb{B})}{\text{vol}_{d-1} (\delta \mathbb{S})} = \frac{\delta}{d}$.

The second part of the question is: How do you arrive at Equation (1)? I assume $\text{vol}_{d-1} (\delta \mathbb{S})$ refers to the surface area of the $(d-1)$-dimensional ball, but I don't see how that pops up there. Moreover, how do you arrive at Equation (2) from Stoke's theorem?

Original Q&A

Approximate gradient with a single function evaluation

Related Questions in CALCULUS

Related Questions in PROBABILITY

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions