Connection between difference quotient and gradient descent update rule

118 Views Asked by At

The difference quotient of a function is defined as

$$w'(t) = \frac{w(t)-w(t-dt)}{dt} = \frac{w_{t}-w_{t-1}}{dt}$$

and as $dt$ approaches zero gives the derivative of function $w$.

Now, the update rule in gradient descent is defined as

$$w_{t} = w_{t-1} - \eta \frac{\partial L}{\partial w_{t-1}}$$

I wonder, if $dt$ is the same as $\eta$? Is there any connection between these two expressions? And is it possible to derive the gradient descent update rule from the difference quotient?

1

There are 1 best solutions below

4
On BEST ANSWER

yes, your expressions already hint at the connection. For a finite, but small $dt \equiv \eta$ you have

$$ w' \approx \frac{w_t - w_{t-1}}{\eta} $$

or equivalently

$$ w_t \approx w_{t - 1} \color{red}{+} \eta w' $$

which is basically your first equation in 1D. Here's the catch though: the gradient points toward the direction of maximum growth of the function $w$, so if you're trying to minimize it (hence the name gradient descent) you want to move in the opposite direction, that's is, you want to flip the sign $\color{red}{+}$ to $\color{red}{-}$, that is

$$ w_t \approx w_{t - 1} \color{red}{-} \eta w' $$