Summary of Optimization Methods.

Question

Summary of Optimization Methods.

2.8k Views Asked by Bumbble Comm At 24 Apr 2026 - 11:07

Context: So in a lot of my self-studies, I come across ways to solve problems that involve optimization of some objective function. (I am coming from signal processing background).

Anyway, I seem to be learning about those methods as I trudge along; in that, they are introduced when there is a problem at hand. This is fine, I suppose, but I do wish I had more of an overarching mental 'map' of them in my head.

For example, I know about the gradient ascent/descent algorithms, and I have also recently learned about a second-order method such as Newton's method, which uses curvature. (Will post different question about it).

Gradient Ascent/Descent:

$$ \mathbf{ w_{n+1} = w_n \pm \alpha\frac{\delta J(w)}{\delta w}} $$

Newton's Method:

$$ \mathbf{ w_{n+1} = w_n \pm \frac{\delta J(w)}{\delta w} \begin{bmatrix} \frac{\delta^2 J(w)}{\delta w^2} \end{bmatrix}^{-1}} $$

Questions:

1) I would like to first off, ask for a summary of other optimization methods that are similar to the above forms, in that, one has to actually compute the first and/or second derivatives of a cost function apriori.

2) My second question is, what optimization methods are there, that dont need one to explicitly have an equation for a cost function, and/or its first/second derivative? For example, let us say I have a black-box cost function that I want to use in an optimization procedure. What method(s) might be available to me to use in this case? Obviously, I would not know its explicit equation for the cost function, or any of its derivatives. It would exist simply as cost = blackbox_cost_function(weight vector, data);

Thanks!

Original Q&A

There are 3 best solutions below

**Bumbble Comm** · Answer 1 · 2012-09-02 14:06:37

Here are my answers to your questions:

(1) I would also add non-linear conjugate gradient method, quasi-Newton methods (e.g. L-BFGS), interior-point method to this list. I am pretty sure there are many more numerical optimization algorithms utilizing gradient information. You may also want to note that it is common practice in optimization to approximate the gradient by finite differences in case you don't have its functional form.

(2) Have a look at the Nelder-Mead algorithm, which is for unconstrained optimization without utilizing derivative information. You will be required to evaluate the output of the objective function during iterations though.

**Bumbble Comm** · Answer 2 · 2012-09-03 13:45:44

About question 2, if you are able to evaluate the objective function, even though you don't have the exact formula (i.e. the objective function is a simulation model, or a black box responding to inputs), then you can use heuristics such as simulated annealing, genetic algorithms, tabu search and the like. In fact, with these methods you do not need to evaluate the first/second derivatives, and they may be really effective in finding good solutions. On the other hand, being heuristic methods, there is no guarantee that the solution(s) found is(are) a global optimum.

On the other hand, if your objective function is very hard or expensive to compute (i.e. the simulation model takes quite some time to run), you may have a look at methods based on sampling or on surrogate function (sorry, I can be of little help about this topic).

**Bumbble Comm** · Answer 3 · 2012-09-07 15:30:11

1)

First-order methods have many variants, for example using conjugate directions instead of steepest descent direction (Conjugate Gradient Method).

There is also a multitude of "line search" algorithms for computing step-length in the first order methods. These include binary search algorithms (Gold section), Newton and quasi-Newton methods: The second order method is used to compute step length in first order method. It seems weird, but the point is that first order method can work in multidimensions, while the second-order line search is used only on univariate objective function (line in step direction).

In theory, you can also use numerical derivatives, i.e. compute them from cost function. This allows you to use first order methods without analytical representation for derivatives. Methods for computing derivatives include Finite difference approximation, Neville's method, Ridder's method and Automatic differentiation.

Second-order methods like Gauss-Newton and Levenberg-Marquardt do not use explicit Hessian:

$$H\approx J^{\mathbb{T}}J$$

The reason behind this approximation (which does not require explicit second-order derivatives) would be outside scope of the answer, but using such Hessian tend to be more numerically stable because second-order terms are noise-sensitive.

2)

There are many methods for derivative-free optimization, some are designed for large and sparse datasets, like the black-box cost function you have. Such methods include: Model-based methods, Coordinate and Pattern-search Methods, Conjugate-Direction Method, Nelder-Mead Method and Implicit Filtering.

Maybe a proper design of your cost function will allow you to omit unknown data without harming the result too much.

The following books helped me greatly and both go into detail when it comes to high-fidelity optimization (large and sparse datasets, unknown derivatives):

Jorge Nocedal, Stephed J. Wright: "Numerical Optimization, Second Edition"

Press, Teukolsky, Vetterling, Flannery: "Numerical Resipes, Third Edition"

Summary of Optimization Methods.

There are 3 best solutions below

Related Questions in OPTIMIZATION

Related Questions in INTUITION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NONLINEAR-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions