Convergence of gradient descent with ordinary least squares

195 Views Asked by At

I am currently trying to understand the numerical optimisation of the Ordinary Least Squares (OLS) approach to linear regression using gradient descent. So let us define the associated optimisation problem as:

\begin{alignat}{2} \min\quad & \frac{1}{2}||y-Xw||_2^2 \\ \mbox{w} \quad \end{alignat}

By using batch gradient descent, one can update the weight parameter until the convergence of the objective function within some tolerance parameter, by using the following rule, where X is a matrix of observations and y is the corresponding output for each observation.

\begin{alignat}{2} & w_0 = randomly\ initialised \end{alignat}

\begin{alignat}{2} & w_{t+1} = w_t - \alpha_t(X^TXw_t - X^Ty) \end{alignat}

I am trying to understand under what conditions on $\alpha_t$ will the batch gradient descent step converge for OLS in general.

Thank you very much for your time!