Complexity of differentiation (numeric vs. automatic/algorithmic).

426 Views Asked by At

I am reading about automatic differentiation and am wondering what the direct comparison is between the complexity of automatic (algorithmic) differentiation and numerical (finite difference) differentiation.

If we have a function $f:\mathbb{R}^n \rightarrow \mathbb{R}^m$ whose arithmetic cost of evaluation is $\text{Cost}(f) = C$, and we want to compute the Jacobian matrix $\mathbf{J}$, are the following temporal complexities correct ?

  1. Numeric differentiation:

$$ \mathcal{O}(C \cdot n \cdot m) $$

  1. Forward (tangent) mode automatic differentiation:

$$ \mathcal{O}(1) \cdot C \cdot n $$

  1. Reverse (adjoint) mode automatic differentiation:

$$ \mathcal{O}(1) \cdot C \cdot m $$

I am also interested in the spacial complexities, and have read that the spacial complexity of reverse mode automatic differentiation is proportional to the cost of evaluating the function. However I am yet to find a concise summary of the spacial complexities of the three approaches.

1

There are 1 best solutions below

6
On

Numerical differentiation has about the same cost as forward differentiation.

Forward AD has more cost, as it has to treat the function as evaluation recipe over some "dual number" data type, where the "imaginary" part of the dual number is a vector. So in some sense it replaces a compiled function with an interpreted function. Source code transformation methods largely eliminate this overhead, but are not as flexible as methods using computational graphs and tapes as actual data structures.

Numerical differentiation has stability issues. If using central differences, the optimal step size is proportional to $\mu^{1/3}$, for instance $h=(1+|x|)\mu^{1/3}$, with relative error of the derivative $\sim \mu^{2/3}$. In contrast, AD has the same error mechanism as direct evaluation, the relative error is proportional to $3M\mu$ where $M$ is the number of elementary operations and function evaluations in the given function, so that $M\mu$ is an estimate of the relative error for the bare function evaluation.