A Variant of Gradient Descent

209 Views Asked by Bumbble Comm At 30 Mar 2026 - 10:37

Suppose I have some objective function $f(\beta)$ which I would like to minimize for $\beta$. A standard gradient descent would be

$\beta^{(t+1)}=\beta^{(t)}-\alpha \nabla f(\beta^{(t)})$,

where $\alpha$, the step size, is optionally subscripted $\alpha_{(t)}$. There are many results about how to choose $\alpha$, be it fixed or variable with time.

The update above is a special case of

$\beta^{(t+1)}=\beta^{(t)}-A \nabla f(\beta^{(t)})$,

where $A=\alpha I$. I would like to consider a gradient descent algorithm whereby $A=diag(\alpha_{1},\alpha_{2},...\alpha_{n})$ - does such a particular variant of gradient descent have a particular name? It is just allowing each coordinate to have a different step size, yet I have not found anything online about it. I imagine that one could do better by having a specific $\alpha_{i}$ for each coordinate than a single $\alpha $ for all. Does anybody know any work that has been done on this?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 22 Feb 2015 - 10:21

The idea of a line search is that the line is one-dimensional. Thus your idea corresponds to n line-searches along the coordinate axes. Thus you lose the information of the gradient.

Alternatively you could use a fixed diagonal matrix $C=diag(c_1,...,c_n)$ to scale the components of the system and gradient, i.e., $$ \beta^{(t+1)}=\beta^{(t)}-αC \nabla f(\beta^{(t)}) $$ to get again a one-dimensional line-search. This is then one choice of a pre-conditioner. However, the determination of "good" diagonal entries is again much guess work. The step to (L-)BFGS is then not that large.

A Variant of Gradient Descent

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in NUMERICAL-METHODS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions