Dimensionality of the gradient descent update equation

Question

Dimensionality of the gradient descent update equation

283 Views Asked by Bumbble Comm At 29 Mar 2026 - 9:39

Considering f a function whose minimum we want to find, a simple gradient descent algorithm would have

x_new = x_old - step * df/dx

If x is measured in [input] units and f(x) is measured in [output] units, the above equation suggests the step is measured in [input**2] / [output].

How should one interpret this? I previously naively thought the step was adimensional, but apparently not.

Now the equation looks a bit odd to me, reading like

x_new = x_old - small_change_in_y

Which it certainly isn't.

The equation still ticks a number of requirements though (moves x in the right direction and slows when the optimum is near)

Thanks!

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2017-01-26 15:41:56

Perhaps this will help. This answer is assuming that step refers to the step length.

You are correct that x_new and x_old are measured in the input space, and f(x) is measure in the output space. However, we are considering $\frac{df}{dx}$ and not $f$, so let's look closer at $\frac{df}{dx}$.

Just like f(x), $\frac{df}{dx}$ is a function on the input space. Therefore, let me write it as $\frac{df}{dx}(w)$, where $w$ is some point in the input space. Intuitively, if we are currently at $w$ then $\frac{df}{dx}(w)$ is the direction in the input space which $f$ increases the most. More precisely, $\frac{df}{dx}(w)$ is the gradient of $f$ at $w$. As an example, consider $f:\mathbb{R}^2\to \mathbb{R}$ defined by $$f\begin{pmatrix}x\\y\end{pmatrix}= x^2+y.$$ The gradient is $$\frac{df}{d_{x,y}}\begin{pmatrix}w\\z\end{pmatrix}= \begin{pmatrix}2w\\1\end{pmatrix}.$$ So this says that at point $\begin{pmatrix}w\\z\end{pmatrix}$, the direction that provides the steepest ascent is $\begin{pmatrix}2w\\1\end{pmatrix}$. For example, if we are currently at the iterate $x_{old} = \begin{pmatrix}0\\0\end{pmatrix}$, then the direction we want to steer our algorithm if we want to ascend the quickest is $$\frac{df}{d_{x,y}}\begin{pmatrix}w\\z\end{pmatrix} = \begin{pmatrix}0\\1\end{pmatrix}.$$ Therefore, back to your question, the update formula reads like this

x_old and x_new are points in the input space

df/dx (better written as $df/dx(x_old)$) is a direction in the input space

step is just a scalar telling you the step length to step in a certain direction.

Dimensionality of the gradient descent update equation

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions