In a minimization problem, i.e., $\min f(\mathbf{x})$, the procedure is
- Start with an arbitrary $\mathbf{x}_1$.
- At $i$-th iteration, calculate a gradient $\nabla f(\mathbf{x}_i)$.
- Find the next point, $\mathbf{x}_{i+1}\in \mathbf{x}_i+\gamma \nabla f(\mathbf{x}_i)$, where $\gamma$ is a real value that minimizes $f(\mathbf{x}_i+\gamma \nabla f(\mathbf{x}_i))$.
I think both "Frank–Wolfe algorithm" and "Gradient steepest descent algorithm" are the same as the above. What is different?
The main difference between the two is that steepest desc also contains an scaling operation that depends on the dual norm of the gradient.