Why does projected gradient descent only consider the sign of the gradient?

86 Views Asked by At

I am talking about the projected gradient descent (PGD) method in the context of adversarial machine learning. The paper Towards Deep Learning Models Resistant to Adversarial Attacks defines PGD using the following formula on page 4.

$$x^{t+1} = \prod_{x+S} (x^t + \alpha \operatorname*{sgn}(\nabla_x L(\theta, x, y))).$$

Why can't we remove the $\operatorname*{sgn}$ operator, and use the following?

$$x^{t+1} = \prod_{x+S} (x^t + \alpha \nabla_x L(\theta, x, y)).$$

We can always project the gradient back to the $\mathscr{L}_\infty$ box by clipping it after applying the gradient descent, so the constraint isn't really a concern.