In bandit convex optimization, we are only given access to zeroth-order oracle of a function but not first-order (gradient) oracle. Hence, people often use some one-point gradient estimator to approximate the gradient of a function at a certain point. More specifically, let $f:\mathbb{R}^d\mapsto \mathbb{R}$ be a function in $\mathbb{R}^d$, $\mathbb{S}$ be the unit sphere in $\mathbb{R}^d$, $\mathbb{B}$ be the unit ball in $\mathbb{R}^d$. For a point $\mathbb{x}\in\mathbb{R}^d$, if we want to estimate $\nabla f(\mathbf{x})$, we randomly sample a $\mathbf{u}$ from $\mathbb{S}$. Let $\delta>0$ be a small positive scalar. We query the zeroth-order oracle and obtain $f(\mathbf{x}+\delta \mathbf{u})$. We then use $\frac{f(\mathbf{x}+\delta \mathbf{u})d\mathbf{u}}{\delta}$ as an approximation of $\nabla f(\mathbf{x})$. Lemma 1 in this paper proved that the expectation of the approximation is the gradient of a smoothed version of $f$, that is let $\hat{f}(\mathbf{x})=\mathbb{E}_{\mathbf{v}\in\mathbb{B}}[f(\mathbf{x}+\delta \mathbf{v})]$, then $\nabla \hat{f}(x)=\mathbb{E}_{\mathbf{u}\in \mathbb{S}}\big[\frac{f(\mathbf{x}+\delta \mathbf{u})d\mathbf{u}}{\delta}\big]$. The proof invoked Stokes's theorem to show the following (equation (15) in the paper): \begin{align} \nabla \int_{\delta \mathbb{B}}f(\mathbf{x}+\mathbf{v})\mathrm{d}\mathbf{v}=\int_{\delta \mathbb{S}}f(\mathbf{x}+\mathbf{u})\frac{\mathbf{u}}{\|\mathbf{u}\|}\mathrm{d}\mathbf{u}. \end{align} I'm trying to understand how exactly does the above follow from the Stokes's theorem. Is it that the curl of the vector field defined as $f(\mathbf{x}+\mathbf{u})\frac{\mathbf{u}}{\|\mathbf{u}\|}$ somehow is related to the left-hand-side?
2026-03-31 16:12:08.1774973528
One-point gradient estimator and Stokes' theorem.
273 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in MULTIVARIABLE-CALCULUS
- Equality of Mixed Partial Derivatives - Simple proof is Confusing
- $\iint_{S} F.\eta dA$ where $F = [3x^2 , y^2 , 0]$ and $S : r(u,v) = [u,v,2u+3v]$
- Proving the differentiability of the following function of two variables
- optimization with strict inequality of variables
- How to find the unit tangent vector of a curve in R^3
- Prove all tangent plane to the cone $x^2+y^2=z^2$ goes through the origin
- Holding intermediate variables constant in partial derivative chain rule
- Find the directional derivative in the point $p$ in the direction $\vec{pp'}$
- Check if $\phi$ is convex
- Define in which points function is continuous
Related Questions in CONVEX-OPTIMIZATION
- Optimization - If the sum of objective functions are similar, will sum of argmax's be similar
- Least Absolute Deviation (LAD) Line Fitting / Regression
- Check if $\phi$ is convex
- Transform LMI problem into different SDP form
- Can a linear matrix inequality constraint transform to second-order cone constraint(s)?
- Optimality conditions - necessary vs sufficient
- Minimization of a convex quadratic form
- Prove that the objective function of K-means is non convex
- How to solve a linear program without any given data?
- Distance between a point $x \in \mathbb R^2$ and $x_1^2+x_2^2 \le 4$
Related Questions in GRADIENT-DESCENT
- Gradient of Cost Function To Find Matrix Factorization
- Can someone explain the calculus within this gradient descent function?
- Established results on the convergence rate of iterates for Accelerated Gradient Descent?
- Sensitivity (gradient) of function solved using RK4
- Concerning the sequence of gradients in Nesterov's Accelerated Descent
- Gradient descent proof: justify $\left(\dfrac{\kappa - 1}{\kappa + 1}\right)^2 \leq \exp(-\dfrac{4t}{\kappa+1})$
- If the gradient of the logistic loss is never zero, does that mean the minimum can never be achieved?
- How does one show that the likelihood solution for logistic regression has a magnitude of infinity for separable data (Bishop exercise 4.14)?
- How to determinate that a constrained inequality system is not empty?
- How to show that the gradient descent for unconstrained optimization can be represented as the argmin of a quadratic?
Related Questions in STOKES-THEOREM
- Stoke's Theorem on cylinder-plane intersection.
- Integration of one-form
- First part Spivak's proof of Stokes' theorem
- Stokes theorem, how to parametrize in the right direction?
- Surface to choose for the Stokes' theorem for intersection of sphere and plane.
- What is wrong with this vector integral?
- Circulation of a vector field through a cylinder inside a sphere.
- Stokes Theorem Equivalence
- Verify Stokes Theorem vector field $\vec{F} = (y, −x, xyz)$
- Evaluate $\iint \text{curl} (y\,\mathbf{i}+2\,\mathbf{j})\cdot n\; d\sigma$ using Stokes Theorem
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
WLOG, $\delta=1, x=0$. Pick arbitrary (constant) vector $c$. Let $F=f c$. Then by divergence theorem
$$\int_S f(u) c \cdot u\: d u = \int_B \operatorname{div} (f(v)c) \: dv=\int_B (\nabla f \cdot c) \: d v$$
or $$c \cdot \int_S f(u) u \:d u = c \cdot \int_B (\nabla f) \:d v$$
Since $c$ was arbitrary, $$\int_S f(u) u \: d u = \int_B (\nabla f) \:d v$$
(Note: you can use $c=e_1$ to get that the two vectors have the same first coordinate, then use $c=e_2$ to get that they have the same second coordinate etc. ; but I like the above coordinate-less way better.)
If $f$ is regular enough, you can exchange $\nabla$ and $\int$ and recover the formula in the question.
You don't need Stokes. Again, it clearly suffices to treat the case $\delta =1$ and $x=0$. Let's say our coordinates are $x_0, x_1, ..., x_{d-1}$ and we want to show that the $0$-th components of the LHS and RHS are the same. Writing $v=u_0 e_0+ v_{rest}$ we get that by Fubini's for a fixed scalar $s$
$$\int_{B^d} f(s e_0+v) \: dv =\int_{s-v_0(v_{rest})}^{s+v_0(v_{rest})} \left(\int_{B^{d-1}} f(u_0e_0+v_{rest}) \: d v_{rest}\right) d u_0$$
Here $v_0(v_{rest})=(1-|v_{rest}|^2)^{1/2}$.
LHS has $0$th component
$$ \frac{d}{ds }\int_{B^d} f(s e_0+v) \: dv$$ which by the above formula and the fundamental theorem of calculus is
$$\int_{B^{d-1}} \left(f(v_0(v_{rest})e_0+v_{rest}) -f(-v_0(v_{rest})e_0+v_{rest})\right)\: dv_{rest} $$
The "upper" hemisphere is parameterized by the inverse of the projection map to $B^{d-1}$, whose $(d-1)$-dimensional volume stretching factor at $v=v_0(v_{rest}) e_0 +v_{rest}$ is $ \frac{1}{v_0}$ (proof sketch: using $d-1$ dimensional polar coordinates on $B^{d-1}$ the angular tangent vectors are undistorted, and the radial tangent is stretched by $ \frac{1}{v_0}$, just as in the $d=2$ case). Hence (by definition of integration over a manifold)
$$\int_{B^{d-1}} \left(f(v_0(v_{rest})e_0+v_{rest}) \right)dv_{rest}=\int_{S^{d-1}_{upper}} f(v) v_0 \: dv. $$
Similarly,
$$\int_{B^{d-1}} \left(-f(v_0(v_{rest})e_0+v_{rest}) \right) \: dv_{rest}=\int_{S^{d-1}_{lower}} f(v) v_0\: dv .$$
Adding these we get $ \int_{S^{d-1}} f(v)v_0 \: dv$, aka the $0$th component of the RHS. The desired result follows.
The stretching factor in more detail:
We have $B^d\setminus 0 = S^{d-1} \times (0,1)$ via $(\vec{u}, r) \to r\vec{u}$. This is also diffeomorphic to punctured open hemisphere via $(\vec{u}, r) \to (r\vec{u}, \sqrt{1-r^2})\in S^d$.
The "inverse projection" in these coordinates is thus $i:r\vec{u}\to (r\vec{u}, \sqrt{1-r^2})$. We have that any tangent vector $\delta \vec{u}$ to $B^d\setminus 0$ which is orthogonal to $\vec{u}$ is mapped to the tangent vector to $S^{d}$ equal to $(\delta \vec{u}, 0)$. These tangent vectors are not stretched. The tangent vector $\vec{u}\delta r$ parallel to $u$ is mapped to $ (\vec{u}\delta r, (\frac{d(\sqrt{1-r^2})}{dr})\delta r)=(\vec{u}\delta r, -\frac{r}{\sqrt{1-r^2}}\delta r)$. The image is still orthogonal to all the $\delta \vec{u}$ which are perpendicular to $\vec{u}$, but now has length $\frac{1}{\sqrt{1-r^2}}\delta r $, i.e. is stretched by a factor of $\frac{1}{\sqrt{1-r^2}}$, which is also the volume stretch factor. In our old notation this is $\frac{1}{v_0}$.