Can gradient descent be written without time step?

66 Views Asked by At

I am trying to learn gradient descent for machine learning. In this highly cited research paper https://arxiv.org/pdf/1609.04747.pdf, the author presents the gradient descent as

$$\theta = \theta - \eta \nabla_\theta J(\theta)$$

I have never seen this expression before. Is this some analytical formula for calculating the variables $\theta$? Wouldn't the $\theta$ be cancelled out? I am confused, please help.

1

There are 1 best solutions below

0
On

As @CogitoErgoCogitoSum mentioned in the comments, the iteration should be written as $$ \theta^{k+1} = \theta^k - \eta \nabla J(\theta^k). $$ Starting at the point $\theta^k$, we take a step in the direction of steepest descent (that is, the negative gradient direction), which moves us to a new point $\theta^{k+1}$ where the value of $J$ has been reduced.