Soft question: Why use optimization algorithms instead of calculus methods?

373 Views Asked by At

I know that some functions are multivariate and therefore take a long time to compute zeros for the gradient, and then test for minima, maxima, etc. However, some optimization algorithms are not tractable either, and numerically they are never analytical and they don't give an analytic solution. So what is the purpose of not simply using the analytical methods we have from calculus eg solving gradient zeros vs possibly incorrect and at best approximate optimization algorithms?

1

There are 1 best solutions below

5
On BEST ANSWER

The reason to use any numerical method is that you might not have an explicit analytical solution to the problem you're trying to solve. In fact, you might be able to prove (as with the three body problem) that no analytical solution involving elementary functions exists. Thus approximate methods (numerical or perturbation-based) are the best we can do, and when applied correctly (this is important), they usually provide answers with high degree of accuracy.

An elementary example of this issue (as mentioned by several comments) is finding roots of polynomials of high degree. As was proved in the early 19th century, there is no explicit formula for the roots of a polynomial of degree 5 or higher. Thus if your derivative consists of such functions, solving $f^\prime(x) = 0$ is only possible using a numerical technique.

For a more complicated example, many times the function you are trying to optimize does not even have an explicit functional form. In calculus, you learn how to optimize functions like "$f(x) = x\exp(-x)$". Here you have an explicit formula for which you can take derivatives and solve $f^\prime = 0$. But many times you don't have this - what you have is something like "let $\boldsymbol{x}_0$ be the initial condition for a nonlinear system of differential equations
$$ \dot{\boldsymbol{x}} = f(\boldsymbol{x})\\ \boldsymbol{x}(0) = \boldsymbol{x}_0 $$ Define $f(\boldsymbol{x}_0) = g(\boldsymbol{x}(1))$, i.e. run the system until $t=1$, then evaluate some function $g()$ there. This function will almost never have an explicit functional form. Thus if you wanted to compute

$$ \min_{\boldsymbol{x}_0} f(\boldsymbol{x}_0), $$ you would almost certainly be forced to use a numerical method.

I will say that there are methods that are a hybrid of numerical and exact/algebraic. For instance, you can use something like a Newton's method (which would usually require Hessian matrices) together with automatic differentiation. Automatic differentiation treats a computational algorithm like an explicit formula and then takes explicit derivatives of it using a (highly sophisticated) version of the chain rule.