Why does projected gradient descent only consider the sign of the gradient?

86 Views Asked by Bumbble Comm At 12 Apr 2026 - 10:47

I am talking about the projected gradient descent (PGD) method in the context of adversarial machine learning. The paper Towards Deep Learning Models Resistant to Adversarial Attacks defines PGD using the following formula on page 4.

$$x^{t+1} = \prod_{x+S} (x^t + \alpha \operatorname*{sgn}(\nabla_x L(\theta, x, y))).$$

Why can't we remove the $\operatorname*{sgn}$ operator, and use the following?

$$x^{t+1} = \prod_{x+S} (x^t + \alpha \nabla_x L(\theta, x, y)).$$

We can always project the gradient back to the $\mathscr{L}_\infty$ box by clipping it after applying the gradient descent, so the constraint isn't really a concern.

Original Q&A

Why does projected gradient descent only consider the sign of the gradient?

Related Questions in OPTIMIZATION

Related Questions in NUMERICAL-METHODS

Related Questions in MACHINE-LEARNING

Related Questions in NONLINEAR-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions