I uses soft svm recently. I found a good slide from http://www.cs.berkeley.edu/~jordan/courses/281B-spring04/lectures/lec6.pdf.
For the soft svm, suppose there are $n$ samples. $f(x_i)=w^Tx_i+b$, $x_i$ is the features of the $i^th$ sample, and $y_i$ is its label. $$\min_w \frac{1}{2}w^Tw+ C\sum_{i=1}^{i=n}\max{(0,1-y_if(x_i))}$$
The slides says 'the second term is piece-wise linear and convex '. I can understand that. But I am confused which 'convergence is not guaranteed with steepest descent methods.' as it says.
I think for a convex function, using steepest descent method can reach the global minimum.