Gradient of Gaussian Smoothing

576 Views Asked by At

In Nesterov's "Random Gradient-Free Minimization of Convex Functions", a Gaussian smoothing of a continuous convex function with respect to parameter $\mu$ and covariance matrix $B^{-1} \succ 0$ is defined as:

$$ f_{\mu} (x) = \frac{1}{\kappa} \int_{\mathbb{R}^n} f(x + \mu u) e^{- \frac{1}{2} u^\top B u} \,\mathrm{d}u \tag{9} $$

where

$$ \kappa = \int_{\mathbb{R}^n} e^{- \frac{1}{2} u^\top B u} \,\mathrm{d}u \tag{10}.$$

The paper goes on to derive a gradient of this expectation,

$$ \nabla f_{\mu} (x) = \frac{1}{\mu \kappa} \int_{\mathbb{R}^n} f(x + \mu u) e^{- \frac{1}{2} u^\top B u} B u \,\mathrm{d} u \tag{21}, $$ which should mean that

$$ \nabla f_{\mu} (x) = \frac{1}{\mu} \mathbb{E}_{u \sim \mathcal{N}(0, B^{-1})} [ f(x + \mu u) B u]. $$

However this seems to conflict with the earlier claim (just below (5)) that

$$ \nabla f_{\mu} (x) = \frac{1}{\mu} \mathbb{E}_{u} [ f(x + \mu u) u ]. $$

I am wondering if I am missing something here. Are there some implicit assumptions that I'm not picking up on?

Note: a similar question (Partial derivative for Gaussian Smoothing) is almost relevant, but assumes $\mathcal{N} (0, I)$ instead of $\mathcal{N} (0, B^{-1})$.

1

There are 1 best solutions below

0
On

Answering my own question for whomever stumbles upon this one day. In the original formulation, $B = I$ would mean that $u \sim \mathcal{N}(0, I)$, which was a likely scenario that would make the calculations work out. Turns out a different way to understand smoothing is to use the following:

$$ f_{\sigma^2} (x) = \mathbb{E}_{w \in \mathcal{N}(0, \sigma^2 I)} [f(x + w)] $$

which is similar to the notation used, and is perhaps easier to intuit. Proceeding with this then:

$$ \begin{eqnarray} \nabla f_{\sigma^2} (x) &=& \nabla \frac{1}{\kappa} \int_{\mathbb{R}^n} f(x + w) e^{-\frac{1}{2 \sigma^2} \lVert w \rVert^2} \,\mathrm{d}w \\ &=& \nabla \frac{1}{\kappa} \int_{\mathbb{R}^n} f(y) e^{-\frac{1}{2 \sigma^2} \lVert y - x \rVert} \,\mathrm{d} y \tag{$y = x + w$}\\ &=& \frac{1}{\sigma^2 \kappa} \int_{\mathbb{R}^n} f(y) (y - x) e^{-\frac{1}{2} \lVert y - x \rVert^2} \,\mathrm{d} y \\ &=& \frac{1}{\sigma^2 \kappa} \int_{\mathbb{R}^n} f(x + w) w e^{-\frac{1}{2} \lVert w \rVert^2} \,\mathrm{d} w \\ &=& \frac{1}{\sigma^2} \mathbb{E}_{w \in \mathcal{N}(0, \sigma^2 I)} [f(x + w)w] \end{eqnarray}$$