While reading this article about the Frank-Wolfe algorithm, I did not understand why the Frank-Wolfe algorithm is projection-free, while the gradient descent is not. I think the problem is, that I do not understand what it means for an operation to be projection free.
Within linear algebra, I understand that a transformation is a projection if it is idempotent. I also understand that it is good to have projection-free algorithms (it makes it possible to reduce the complexity) - but I do not know why this is the case. Is it because the Frank-Wolfe uses a linear optimization oracle?
So my question is this: When is an operation projection-free, and why can the complexity of projection-free algorithms be reduced?