Effect of Scale of Data and Objective Function in the Convergence of Gradient Descent

134 Views Asked by At

One way to tune step size in gradient descent is via backtracking line search.

backtracking line search (with parameters α ∈ (0, 1/2), β ∈ (0, 1))

starting at $t = 1$, repeat $t := \beta t$ until $f(x + t\Delta x) < f(x) + \alpha t\nabla f(x)t\Delta x$

There are suggestions about what range of values to use for $\alpha$ and $\beta$, however, none of these discuss the scale of either data or loss function. The question is shouldn't range of values of data and $f$ have conceptually some effect in selecting $\alpha$ and $\beta$? Is it assumed that data is normalized? How about objective value $f$?