The way I have been presented gradient descent at least from Levitin and Polyak is that you do the gradient step: $\theta_{t+1} = \theta_t - \eta_t \nabla_t(\theta_t)$ and then afterwards you project to your convex set $C$ after your gradient step: $\theta_{t+1} = P_C(\theta_{t+1})$. Intuitively, I am wondering why is projection necessary after every step, shouldn't you be able to just carry out gradient descent, and after some long enough $t$ isn't projection at the end close enough optimal $\theta^*$? Or is there a counterexample of a convex set that does not allow this?
2026-03-30 11:34:47.1774870487
Projected gradient descent, why project at every iteration?
59 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
The projected gradient descent attempts to approximate the steepest descent inside $C$ in order to find the minimum there. The gradient descent without projection will diverge in general from the descent inside $C$. One of the following two scenarios are likely:
Example. Consider the function $f(x,y) = 0.001 x^2 + y^2$, and $C=B_1(10, 2)$, which is the closed unit ball centred at $(10, 2)$.