Nesterov momentum as defined by Aggarwal in Neural Networks and Deep Learning is
But I've made a tweak to the function, and I wanted to ask whether it made sense.
I made this change because momentum-based learning as explained in Machine Learning Refined by Wiatt et al does an exponential average whose weights sum to 1, and I wanted to give the same effect to Nesterov Momentum.