Math behind various optimizers in deep learning

232 Views Asked by At

I am looking forward to learn the math behind the various optimizers present in neural networks such as adam,sgd,etc. Through this, I want to learn what makes one optimizer better than another for a particular case. Is anyone aware of resources that explain the math behind the various optimizer in deep learning?

1

There are 1 best solutions below

0
On BEST ANSWER

These algorithms are based on linear algebra, multivariable calculus and probability theory (when the algorithms are stochastic). Specifically, rather than attempting to learn generic results from these branches, I would recommend starting with reviews of different optimisation algorithms, such as 1 or the more in-depth 2, and refer back to Khan-academy/Wikipedia if some of the mathematical formalisms are unclear.

Once you have gained an overview and intuition of a few basic algorithms, you should start reading the relevant literature on optimisation in deep learning, c.f. 3, 4 for reviews, and then the relevant references from these articles.