I am following a book which has a part on numerical optimization techniques. In order to elaborate Karush-Kuhn-Tucker theorem, they gave the following example:
When the unconstrained solution $x=A^+ b$ does not lie in the feasible region, they apply an iterative approach to find the solution where the norm of $x$ is equal to $1$. They seem to increase the value of $\lambda$ which causes the optimal solution for $x$ to shrink its norm, until the norm converges to $1$. What I do not understand is that they say they apply gradient ascent to $\lambda$. What is the logic behind applying gradient ascent in this scheme? The aim is to find a solution to $L(x,\lambda)$ where $x$ is optimal and satisfying $x^Tx \leq 1$ but I do not see how applying gradient ascent only to $\lambda$ by calculating $\dfrac{\partial L(x,\lambda)}{\partial \lambda}$and updating the optimal $x$ leads to convergence. What is really going on here?