Is gradient descent nothing other than discretized gradient flow?

172 Views Asked by At

I can find definitions of the gradient flow of a scalar field $f$ as $$\frac{d \xi}{dt} = - \nabla_\xi f$$ in here and here.

Gradient descent can be used to find a minimum in $f$ and can be written as $$\xi_{i+1} = \xi_{i} - \lambda \nabla_\xi f,$$ where $\lambda$ is a constant scalar.

As far as I understand, gradient descent is nothing other than discretized gradient flow. So I set $\lambda = 1$ to obtain $$\frac{d \xi}{dt} = \lim_{\Delta t \rightarrow 0} \frac{\xi_{i+1} - \xi_i}{\Delta t} = - \lim_{\Delta t \rightarrow 0} \frac{\nabla_\xi f}{\Delta t} = - \frac{d}{dt} \nabla_\xi f \stackrel{???}{=} - \nabla_\xi f = \frac{d \xi}{dt}.$$

This does not make much sense to me... Shouldn’t both $\frac{d \xi}{dt}$ give the same? What mistake am I making? Could anyone help me, please?

2

There are 2 best solutions below

1
On BEST ANSWER

What you are missing is that $\lambda$ "is" the time scale $\Delta t$. So you should rather interpret the first formula as $$\xi_{t+\Delta t}:=\xi_t - (\Delta t)\nabla_{\xi_t}f,$$ and with this you get what you want.

0
On

I think the confusion might stem from the ambiguity in notation. The definition of gradient flow is $\frac{d \xi}{dt}(t_0) = - \nabla f(\xi(t_0))$, where the right-hand side is the gradient of $f$ evaluated at $\xi(t_0)$. (This is closer to the notation used in the second link.)

Further, the discrete step is $\xi_{i+1} = \xi_i - \lambda \nabla f(\xi_i)$. This is simply a rearrangement of $$-\nabla f(\xi(t_i)) = \frac{d\xi}{dt}(t_i) \approx \frac{\xi(t_i + \lambda) - \xi(t_i)}{\lambda},$$ where $t_i$ is such that $\xi_i = \xi(t_i)$. In particular, $\lambda$ is the parameter that should be driven to zero to see the asymptotic correspondence with gradient flow; you cannot just set it at $\lambda=1$.