Considering f a function whose minimum we want to find, a simple gradient descent algorithm would have
x_new = x_old - step * df/dx
If x is measured in [input] units and f(x) is measured in [output] units, the above equation suggests the step is measured in [input**2] / [output].
How should one interpret this? I previously naively thought the step was adimensional, but apparently not.
Now the equation looks a bit odd to me, reading like
x_new = x_old - small_change_in_y
Which it certainly isn't.
The equation still ticks a number of requirements though (moves x in the right direction and slows when the optimum is near)
Thanks!
Perhaps this will help. This answer is assuming that step refers to the step length.
You are correct that x_new and x_old are measured in the input space, and f(x) is measure in the output space. However, we are considering $\frac{df}{dx}$ and not $f$, so let's look closer at $\frac{df}{dx}$.
Just like f(x), $\frac{df}{dx}$ is a function on the input space. Therefore, let me write it as $\frac{df}{dx}(w)$, where $w$ is some point in the input space. Intuitively, if we are currently at $w$ then $\frac{df}{dx}(w)$ is the direction in the input space which $f$ increases the most. More precisely, $\frac{df}{dx}(w)$ is the gradient of $f$ at $w$. As an example, consider $f:\mathbb{R}^2\to \mathbb{R}$ defined by $$f\begin{pmatrix}x\\y\end{pmatrix}= x^2+y.$$ The gradient is $$\frac{df}{d_{x,y}}\begin{pmatrix}w\\z\end{pmatrix}= \begin{pmatrix}2w\\1\end{pmatrix}.$$ So this says that at point $\begin{pmatrix}w\\z\end{pmatrix}$, the direction that provides the steepest ascent is $\begin{pmatrix}2w\\1\end{pmatrix}$. For example, if we are currently at the iterate $x_{old} = \begin{pmatrix}0\\0\end{pmatrix}$, then the direction we want to steer our algorithm if we want to ascend the quickest is $$\frac{df}{d_{x,y}}\begin{pmatrix}w\\z\end{pmatrix} = \begin{pmatrix}0\\1\end{pmatrix}.$$ Therefore, back to your question, the update formula reads like this
x_old and x_new are points in the input space
df/dx (better written as $df/dx(x_old)$) is a direction in the input space
step is just a scalar telling you the step length to step in a certain direction.