I'm a student of machine learning and consequently also a student of physics, mathematical optimization and statistics. One of the most elegant optimization routines I have bumped into in my studies is the Hamiltonian Monte Carlo (HMC) which utilizes Hamiltonian mechanics in order to improve the convergence of the Markov Chain Monte Carlo (MCMC)-method by utilizing gradient information. My level of understanding of the HMC method is still on the shallow end, but it somehow resembles a bit the Stochastic Gradient Descent (SGD)-method, at least from a high-level perspective.
My question thus is: What is the key difference between the SGD and HMC methods? I bet of course there are technical differences, but on a fundamental conceptual level what is the difference? Or is the basic high-level idea the same: "Utilize gradient descent with random component to avoid local optima"