Understanding Gradient descent with momentum equations

37 Views Asked by At

I'm working on implementing gradient descent with momentum for root finding but I am slightly confused about a part of the equation, it's said that you can replace your regular gradient step by doing: $$d_n = \gamma d_{n-1}+2J^TF$$ and updating the iterate $x$ as $$x_{n+1}=x_n-\lambda d_n$$

When calculating $d_n$, is the $2$ multiplied by the resulting matrix of $J^TF$or is it just multiplied by $J^T$ and then this is multiplied by $F$?

This is my current implementation which I'm not sure if it's correct:

dn = 0
gamma = 0.8
dn_prev = 0

while (norm(F,2) > tol and n <= max_iterations):
    # Jacobian matrix evaluation 
    J = eval(jac)(x,2,fnon,F,*fnonargs)
    
    # Gradient descent step 
    dn = gamma*dn_prev+2*(np.matmul(np.transpose(J),F))
   
    dn_prev = dk
    # old gradient descent: delta = -2 * np.matmul(np.transpose(J), F) #descent direction 

    lamb = 0.01
    x = x - lamb*dn