When I run some nonlinear optimization code - I often encounter people saying that there is no global nonlinear optimization code that is guaranteed to reach a global maxima. Instead it is recommended that care is made to optimize the guess of inital parameters to make it more likely to converge to a global maximum. For example if I try to fit some nonlinear regression, I have to get a really good guess for a nonlinear optimization of the chi^2 (sum of the square of the errors between the fit and the data) to actually reach the global optimum.
On the other hand, it seems as though neural networks can do some extremely advanced nonlinear regression - and it's typically at least presented as if this is done without any sort of hand-tuning to a particular problem like requirement to get a good initial guess.
For example it's not like I need to first come up with a good guess if a picture is of a cat or not for it to converge to the right answer.
What's the reason for the difference between these two things?