The ODE modeling of gradient descent with diminishing stepsize

577 Views Asked by At

The gradient descent (GD) with constant stepsize $\alpha^{k}=\alpha$ takes the form $$x^{k+1} = x^{k} -\alpha\nabla f(x^{k}).$$ Then, by constructing a continuous-time version of GD iterates satisfying $X(k\alpha)=x^{k}$ and taking $\alpha\to 0$, we could obtain a limiting ode for constant-stepsize GD of the form $$\lim_{\alpha\to 0}\frac{X(t+\alpha)-X(t)}{\alpha} = \nabla f(X(t))\Rightarrow\frac{dX(t)}{dt} = -\nabla f(X(t)).$$

My question is that if we use the diminishing stepsize with the form $\alpha^k = \alpha/(k+1)$, could we derive the corresponding ODE as $\alpha\to 0$. I guess, the limiting ode for diminishing-stepsize GD might take the form of $$\frac{dX(t)}{dt} = -\frac{1}{t+1}\nabla f(X(t)).$$

Thanks!

1

There are 1 best solutions below

1
On

First, I'm guessing you made a typo in the Gradient Desccent (GD) formula with subtraction rather than multiplication of the stepsize $\alpha$, which should be $$x^{(k+1)}=x^{(k)}-\alpha\cdot\nabla f\left(x^{(k)}\right)$$ (I also added parentheses in the superscript to avoid confusion with the exponential form.)

The stepsize is a parameter you use to train using the GD. The $\alpha$ in the ODE formula may be related but is independent of the actual stepsize $\alpha$ you use.

How you reduce the stepsize over the training period does not change the ODE, since ODE already defines the gradient when the so-called "stepsize" is limited to 0.

And to answer your comment, the stepsize is only relevant in the GD steps. It does not reflect anything in ODE, except maybe the infinitesimal $\alpha$ variable, which already is limited to 0, so no changes in the step size $\alpha$ will cause any changes in the original function $f$ nor the limit variable $\alpha$ (not the same as step size).