Why is the inverse of the gradient not taken in gradient descent?

355 Views Asked by Bumbble Comm At 29 Mar 2026 - 3:28

I am trying to conceptually understand the gradient descent algorithm update rule \ $$\theta_1 = \theta_0 - \alpha \nabla_{\theta} J(\theta_0)$$

where $J(\theta)$ is the function that is being minimized.

Ever since I can remember, I was told that the definition of derivative/gradient was the "ratio of small changes applied to $y$ to small changes applied to $x$."

From that I get

$$\triangle y \approx \triangle x \frac{dy}{dx}$$

But the gradient descent rule seems to be defying that definition and logic. I am interpreting the expression $$\alpha \nabla_{\theta}J(\theta_0)$$ term like the above definition, so that it gives the approximate change in $J$.

Basically, $$J_1-J_0 = \triangle J \approx \alpha \nabla_{\theta}J(\theta_0)$$

Obviously, if we follow the logic of this, the units are mismatched because then this expression $\alpha \frac{d}{d\theta}J(\theta_0)$ is subtracted from $\theta_0$. It is like if $J$ were oranges and $\theta$ were apples and we wanted to see the relationship between oranges and apples. We would be subtracting the wrong quantities.

So can anyone help me understand why this isn't the correct way of looking at the update rule? Wouldn't it make more sense to write the rule as $$\theta_1 = \theta_0 - \alpha (\nabla_{\theta}J(\theta_0))^{-1}$$

Original Q&A

Why is the inverse of the gradient not taken in gradient descent?

Related Questions in DERIVATIVES

Related Questions in OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions