I've been reading this paper and in equation 23 they ask us to consider the optimization problem. \begin{align} \hat{u} = \arg \min_{u \in C - x^t} \left\{ \frac{1}{2} \lVert Au \rVert^2_2 - \langle A^T(y-Ax^t), u\rangle \right\} \end{align}
where $x^t$ is iterate at step $t$, $A$ is $n\times p$ data matrix, $y$ is $n\times 1$ and $C$ is our feasible region. The claim is that $\hat{u} = x^{LS} - x^t$ where $x^{LS}$ is the least squares solution to $Ax^{LS} = y$.
Would someone be able to guide me through how the authors got this result?
Taking the gradient of the objective function yields $$ \frac{\partial \phi}{\partial \mathbf{u}} = \mathbf{A}^T \mathbf{Au} - \mathbf{A}^T (\mathbf{y}-\mathbf{A}\mathbf{x}^{[t]}) = \mathbf{A}^T \left[ \mathbf{A}(\mathbf{u}+\mathbf{x}^{[t]}) - \mathbf{y} \right] $$ The gradient is null iff $$ \mathbf{u} = \mathbf{x}_{LS} - \mathbf{x}^{[t]} $$