Page 7 of https://web.stanford.edu/class/cs229t/scribe_notes/10_10_final.pdf
I've tried finding a proof online, but haven't been able to find it. In the notes above which are provided as part of Stanford's Statistical Learning Theory, the hinge loss is defined as: $$ l(z,h) = max(0,1-y_ih(x_i) ) $$ where $z = (x,y)$, and $h$ is some hypothesis.
Is it possible to provide a proof that this is $1$-Lipschitz? Furthermore, in general, is there a particular method to show this for functions that may not have a gradient defined at every point?
I hope someone can correct my (wrong) intuition: If a function $f(x)$ is $K$-Lipschitz, then the magnitude of its gradient is bounded by $K$. But in this case, why is the gradient of the hinge loss necessarily bounded? Can't we choose two points $x_1$ and $x_2$ which are arbitrarily close, but correspond to different labels such that $l(z_1,h) = 0$ but $l(z_2,h) = 1$?