Looking at evolutionary algorithms as a way to optimize a function, it seems like one of the biggest advantages of them is being able to work in non-convext functions. Because you have many actors in every generation, even if one branch of evolution gets stuck in a local minimum (/ maximum), someone else will go on (randomailty and mutation).
How is that different than having a gradient descent style algorithm just start at many random starting points?
Even with a lot of actors randomly placed in the solution space, the gradient method will still force those actors to find the local maximum, so it is still possible that no actor will find the global maximum.
On the other hand, for the evolutionary algorithm there is always a chance that through random mutations the actors find the global maximum. In fact, even with one actor per generation this is possible.