Suppose I have some objective function $f(\beta)$ which I would like to minimize for $\beta$. A standard gradient descent would be
$\beta^{(t+1)}=\beta^{(t)}-\alpha \nabla f(\beta^{(t)})$,
where $\alpha$, the step size, is optionally subscripted $\alpha_{(t)}$. There are many results about how to choose $\alpha$, be it fixed or variable with time.
The update above is a special case of
$\beta^{(t+1)}=\beta^{(t)}-A \nabla f(\beta^{(t)})$,
where $A=\alpha I$. I would like to consider a gradient descent algorithm whereby $A=diag(\alpha_{1},\alpha_{2},...\alpha_{n})$ - does such a particular variant of gradient descent have a particular name? It is just allowing each coordinate to have a different step size, yet I have not found anything online about it. I imagine that one could do better by having a specific $\alpha_{i}$ for each coordinate than a single $\alpha $ for all. Does anybody know any work that has been done on this?
The idea of a line search is that the line is one-dimensional. Thus your idea corresponds to n line-searches along the coordinate axes. Thus you lose the information of the gradient.
Alternatively you could use a fixed diagonal matrix $C=diag(c_1,...,c_n)$ to scale the components of the system and gradient, i.e., $$ \beta^{(t+1)}=\beta^{(t)}-αC \nabla f(\beta^{(t)}) $$ to get again a one-dimensional line-search. This is then one choice of a pre-conditioner. However, the determination of "good" diagonal entries is again much guess work. The step to (L-)BFGS is then not that large.