I'm working on implementing gradient descent with momentum for root finding but I am slightly confused about a part of the equation, it's said that you can replace your regular gradient step by doing: $$d_n = \gamma d_{n-1}+2J^TF$$ and updating the iterate $x$ as $$x_{n+1}=x_n-\lambda d_n$$
When calculating $d_n$, is the $2$ multiplied by the resulting matrix of $J^TF$or is it just multiplied by $J^T$ and then this is multiplied by $F$?
This is my current implementation which I'm not sure if it's correct:
dn = 0
gamma = 0.8
dn_prev = 0
while (norm(F,2) > tol and n <= max_iterations):
# Jacobian matrix evaluation
J = eval(jac)(x,2,fnon,F,*fnonargs)
# Gradient descent step
dn = gamma*dn_prev+2*(np.matmul(np.transpose(J),F))
dn_prev = dk
# old gradient descent: delta = -2 * np.matmul(np.transpose(J), F) #descent direction
lamb = 0.01
x = x - lamb*dn