Efficient methods for numerical derivatives during gradient descent?

35 Views Asked by At

Assume that I have a high-dimensional, multivariate variable $x \in \mathbb{R}^N$ and a smooth function $f(x) \rightarrow \mathbb{R}$. My task is to minimize this function with gradient descent. For this, I usually require local estimates of the partial derivatives $\frac{\partial_k f(x)}{\partial x_k} \forall k \in 1,...,N$. Assume further that I cannot calculate these partial derivatives analytically, and instead have to approximate them through finite differences, e.g.:

$$\frac{\partial_k f(x)}{\partial x_k} \approx \frac{f(x+\epsilon_k)-f(x)}{\Vert{\epsilon_k}\Vert} \forall k \in 1,...,N$$

where $\epsilon$ is a $N$-dimensional vector of zeros except a small non-zero entry of $\Vert{\epsilon_k}\Vert\rightarrow0$ in the $k$th element. Finally, assume that evaluations of $f(x)$ are computationally very expensive.

Now if I have to iterate many times to find a minimum and have to evaluate $f(x)$ a total of $N+1$ times (once $f(x)$, and once perturbed for each component in $x$) for each iteration, this can quickly become intractable, especially for large $N$.

However, I expect that there might be more efficient methods to get the derivatives and thus reduce this naive computational effort. E.g., if the step size is small, I imagine that the derivatives would be quite similar to each other, which might permit a reduced calculation to simply update the derivatives rather than re-calculating them completely.

I expect that this is a common problem, and thus wanted to collect a few references to relevant books or articles, or nods to techniques which might help with this issue.