Optimization - how is using a evolutionary algorithm different than using gradient descent with multiple start points

374 Views Asked by At

Looking at evolutionary algorithms as a way to optimize a function, it seems like one of the biggest advantages of them is being able to work in non-convext functions. Because you have many actors in every generation, even if one branch of evolution gets stuck in a local minimum (/ maximum), someone else will go on (randomailty and mutation).

How is that different than having a gradient descent style algorithm just start at many random starting points?

2

There are 2 best solutions below

2
On

Even with a lot of actors randomly placed in the solution space, the gradient method will still force those actors to find the local maximum, so it is still possible that no actor will find the global maximum.

On the other hand, for the evolutionary algorithm there is always a chance that through random mutations the actors find the global maximum. In fact, even with one actor per generation this is possible.

0
On

I think the difference should be more considered as to what kind of search space you are interested in, rather than the exact behaviour at local minima.

For extremely noisy and discontinuous objective functions, using the gradient is essentially useless and it is better to spend computation time exploring disparate parts of space rather than walk around locally. For smooth, convex objectives, using an evolutionary approach is computationally wasteful instead.

As others have mentioned, given enough restarts, one should in principle find the global maximum (although pathological cases can exist, where the local minimums "lure" away any agents from the global extremum). The EA will not have this pathology.

I think it is important to note that, in practice for noisy objective functions, gradient descent is modified to be stochastic gradient descent or simulated annealing. Both are ways to escape local minima and can also be used in conjunction with random restarts.

For evolutionary algorithms, one modification is the use of memetic algorithms, where one combines the global evolutionary optimization approach with local (often gradient-descent-style searches). So the two approaches are less different than one might think.