I am struggling with prooving convergence for an optimizer which uses adaptive step-size with heavy ball algorithm for convex and non-convex functions. In some literature, I could find a regret bound analysis/proof for convex functions and proving that the estimated gradient at t -> inf goes to zero.
Could anyone please guide me through the process of convergence proof for non-convex functions or give me literature recommendations for the same.
Thank you very much in advance.