gradient of projection operator in projected gradient descent

450 Views Asked by At

Sometimes to solve a constrained optimization problem, projected gradient descent algorithm is used. Since the projection to a constraint set is some function, one might think it is needed to compute the gradient of this projection function via chain rule/backprop once applying the projection. This is not right, but can someone give some intuition or rigorous reasoning why?