Lipschitz Number in Gradient Descent

429 Views Asked by user52029 At 29 Mar 2026 - 12:40

During gradient descent, if an objective function's value is greater than the previous iteration, would use of an orthogonal vector to the update vector be advantageous? Regarding trust regions, the Lipschitz Theorem specifies use of $1/L$, where $L$ is the greatest eigenvalue of the Hessian matrix, $\nabla^2 f(x^*)$, but this approach follows the same approach used in step halving, or essentially use of a learning rate $\alpha$.

Is there a commonly used approach of late that guarantees function minimization with a decreasing value of $f(x^*)$ at each iteration? I am trying to keep the computational expense down, because I have more than $n=50,000$ records and $p>20$ variables.

The main goal is to point the update gradient (Jacobian) downward if an update vector reflects upward. Without doing a grid search, the trick is to know you're still directed toward the global minimum after performing any modifications.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 06 Sep 2015 - 9:26

Gradient descent need not always go downhill -- google non-monotone gradient-descent.

Can you describe your problem a bit: classification, regression ... ? Is the data sparse ? What program are you using ? Are the gradients smooth, or noisy ?

I can recommend Stochastic gradient descent in scikit-learn.

Lipschitz Number in Gradient Descent

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in LIPSCHITZ-FUNCTIONS

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions