I am writing a computer program that searches for the minimum of a multivariate function $f: \mathbb{R}^n \to \mathbb{R}$. This function is in fact the sum of many functions:
$$f(x) = \sum_{i=1}^m f_i(x)$$
The computation starts from some guess $x_0$ of a minimum point, and at each iteration $j$ it improves the current guess $x_j$ into a better one $x_{j+1}$ (using some variant of Newton's method or steepest descent). As $x_j$ approaches the minimum, the gradient becomes closer to 0. The problem is, that to compute the gradient, I need to sum up many gradients, like this
$$\nabla f = \sum_{i=1}^m \nabla f_i$$
and the $\nabla f_i$'s are nonzero. The fact that their sum is close to zero means that if we compute this sum in floating point arithmetics, we get a big accumulated error and the computation is very inaccurate. When I try to approximate the gradient by evaluating $f$ at different points near $x_j$, the approximate gradient comes out indeed quite different from the computed gradient. (In comparison, at points where $\nabla f$ is far from zero, the computed and approximate gradients are very close).
Does anyone know of a good method to compute the gradient near extremal points?
There are basically three mathods to provide the gradients to an optimization algorithm:
You provide a function
gradF(x1,x2...)in which you implement the analytic form of the gradient function.You use finite-differences to numerically approximate the gradient.
You can use symbolic (automatic) differentiation (i.e. to generate the gradient function implementation given the implementation of your objective function).
Also, you have to understand the behaviour of your function and you have to adjust the termination conditions such as function tolerance (i.e. $f(x_i)-f(x_{i-1})$, where $i$ is the step number) and the value tolerance (i.e. $x_i-x_{i-1}$).