Estimating $\|\nabla^2 f(x)\|_\mathsf{F}$ and $\mathsf{tr}\,\nabla^2 f(x)$ from the gradient.

89 Views Asked by At

As the title states, are there ways to frame $\|\nabla^2 f(x)\|_\mathsf{F}^2$, the (squared) Frobenius norm of the Hessian of a function $f$, and $\mathsf{tr}\,\nabla^2 f(x)$ in the form of some natural limit of the gradient $\nabla f$?

If nothing comes to mind for the Frobenius norm, how about estimating $\|\nabla^2 f(x)\|_2$, the spectral norm, from the gradient?

1

There are 1 best solutions below

0
On BEST ANSWER

Frobenius Norm Estimation: Denote by $\mathcal{H}$ the underlying Hilbert space of (finite) dimension $n$. Note that $$\|A\|_\mathsf{F}^2 = \sum_{i=1}^n\|Ae_i\|^2 = n\mathbb{E}[\|Ae_i\|^2]$$ where $\{e_i\}$ is any orthonormal basis for $\mathcal{H}$. Now since this expression is independent of basis we have $$\|A\|_\mathsf{F}^2 = n\mathbb{E}[\|Au\|^2],$$ this time taking the expectation over all unit vectors in $\mathcal{H}$. It follows that $$\|A\|_\mathsf{F}^2 \simeq n\frac{\|Ax\|^2}{\|x\|^2}$$ is an unbiased estimator of the Frobenius norm of $A$ if $x$ is sampled uniformly from $\mathcal{H}$. Now note that $$\frac{\nabla f(y) - \nabla f(x)}{\|y-x\|} = \frac{\nabla^2 f(x) (y-x)}{\|y-x\|} + o(1)$$ by definition (where $f\in o(1)$ here means $\lim_{y\to x}f(y)=0$). This implies $$\frac{\|\nabla f(y) - \nabla f(x)\|^2}{\|y-x\|^2} = \frac{\|\nabla^2 f(x) (y-x)\|^2}{\|y-x\|^2} + o(1)$$ by a triangle inequality argument. We conclude that $$\|\nabla^2 f(x)\|_\mathsf{F}^2 \simeq n\frac{\|\nabla f(y) - \nabla f(x)\|^2}{\|y-x\|^2} + o(1)$$ is an unbiased estimator of $\|\nabla^2 f(x)\|_\mathsf{F}^2$ if $y$ is sampled uniformly around $x$.


Trace Estimation: In spirit this is similar to the above. We have $$\mathsf{tr}\,A = \sum_{i=1}^n\langle Ae_i,e_i\rangle = n\mathbb{E}[\langle Ae_i,e_i\rangle] = n\mathbb{E}[\langle Au,u\rangle]$$ where the second to last expectation is taken with respect to an orthonormal basis $\{e_i\}$ and the last expectation is taken over all unit vectors $u$ in $\mathcal{H}$ since the trace is independent of basis. This gives the unbiased estimator $$\mathsf{tr}\,A \simeq n\frac{\langle Ax,x\rangle}{\langle x,x\rangle}$$ for the trace of $A$ if $x$ is sampled uniformly from $\mathcal{H}$. The same definition of the Hessian tells us $$\mathsf{tr}\,\nabla^2 f(x) \simeq n\frac{\langle\nabla f(y) - \nabla f(x),y-x\rangle}{\|y-x\|^2} + o(1)$$ forms an unbiased estimate of the trace of $\nabla^2 f(x)$ is $y$ is sampled uniformly around $x$.