During gradient descent, if an objective function's value is greater than the previous iteration, would use of an orthogonal vector to the update vector be advantageous? Regarding trust regions, the Lipschitz Theorem specifies use of $1/L$, where $L$ is the greatest eigenvalue of the Hessian matrix, $\nabla^2 f(x^*)$, but this approach follows the same approach used in step halving, or essentially use of a learning rate $\alpha$.
Is there a commonly used approach of late that guarantees function minimization with a decreasing value of $f(x^*)$ at each iteration? I am trying to keep the computational expense down, because I have more than $n=50,000$ records and $p>20$ variables.
The main goal is to point the update gradient (Jacobian) downward if an update vector reflects upward. Without doing a grid search, the trick is to know you're still directed toward the global minimum after performing any modifications.
Gradient descent need not always go downhill -- google non-monotone gradient-descent.
Can you describe your problem a bit: classification, regression ... ? Is the data sparse ? What program are you using ? Are the gradients smooth, or noisy ?
I can recommend Stochastic gradient descent in scikit-learn.