Coordinate-wise gradient descent converges to least-squares solution

79 Views Asked by At

Does somebody know a reference (or maybe short proof/argument) for the following claim:

Coordinate-wise gradient descent converges to a least-squares solution.

Coordinate-wise gradient descent: chooses at each iteration the component (coordinate) of a pre-specified design matrix for which the gradient (of the squared error loss function) is maximum. Then estimates the gradient via least-squares and proceeds along this estimate of the negative gradient.