Difference between local multivariate optimization and stochastic gradient descent?

97 Views Asked by At

Sorry if this sounds like a basic question but I don't understand what is the difference between local multivariate numerical optimization (minimization) and stochastic gradient descent used in neural networks.

  • Examples of multivariate numerical optimizers: Nelder-Mead algorithm, Powell algorithm, Conjugate Gradient algorithm, BFGS algorithm , et al

  • Examples of stochastic gradient descent: ISGD, Momentum, Averaged stochastic gradient descent, AdaGrad, Adam, et al

What is the difference between these two classes of mathematical algorithms, when to use each and for what types or topology of functions? To my understanding both classes are used in NN systems, when to use which?