Context: So in a lot of my self-studies, I come across ways to solve problems that involve optimization of some objective function. (I am coming from signal processing background).
Anyway, I seem to be learning about those methods as I trudge along; in that, they are introduced when there is a problem at hand. This is fine, I suppose, but I do wish I had more of an overarching mental 'map' of them in my head.
For example, I know about the gradient ascent/descent algorithms, and I have also recently learned about a second-order method such as Newton's method, which uses curvature. (Will post different question about it).
Gradient Ascent/Descent:
$$ \mathbf{ w_{n+1} = w_n \pm \alpha\frac{\delta J(w)}{\delta w}} $$
Newton's Method:
$$ \mathbf{ w_{n+1} = w_n \pm \frac{\delta J(w)}{\delta w} \begin{bmatrix} \frac{\delta^2 J(w)}{\delta w^2} \end{bmatrix}^{-1}} $$
Questions:
1) I would like to first off, ask for a summary of other optimization methods that are similar to the above forms, in that, one has to actually compute the first and/or second derivatives of a cost function apriori.
2) My second question is, what optimization methods are there, that dont need one to explicitly have an equation for a cost function, and/or its first/second derivative? For example, let us say I have a black-box cost function that I want to use in an optimization procedure. What method(s) might be available to me to use in this case? Obviously, I would not know its explicit equation for the cost function, or any of its derivatives. It would exist simply as cost = blackbox_cost_function(weight vector, data);
Thanks!
Here are my answers to your questions:
(1) I would also add non-linear conjugate gradient method, quasi-Newton methods (e.g. L-BFGS), interior-point method to this list. I am pretty sure there are many more numerical optimization algorithms utilizing gradient information. You may also want to note that it is common practice in optimization to approximate the gradient by finite differences in case you don't have its functional form.
(2) Have a look at the Nelder-Mead algorithm, which is for unconstrained optimization without utilizing derivative information. You will be required to evaluate the output of the objective function during iterations though.