I'm looking at a proof of Gradient descent convergenceLecture notes, but I can't follow where how Lemma 4.1 comes from the Lipschitz assumption.
A function f is L-Lipschitz if and only if: $|| \nabla f(x) - \nabla f(y)||_2 \le L||x-y|| $
Lemma 4.1: If $f$ is L-Lipschitz, then
$$|f(y)-f(x) - \langle\nabla f(x),y-x \rangle| \le \frac{L}{2}||y-x||^2$$
Can anyone explain where the Lemma comes from? Or point me to a reference that shows this?
Thanks
$f(y) = f(x) + {\partial f(x) \over \partial x} (y-x) + \int_0^1 ( {\partial f(x+t(y-x)) \over \partial x} - { \partial f(x) \over \partial x} ) (y-x) dt$.
Hence $f(y) - f(x) - {\partial f(x) \over \partial x} (y-x) = \int_0^1 ( {\partial f(x+t(y-x)) \over \partial x} - { \partial f(x) \over \partial x} ) (y-x) dt$.
Note that $|\int_0^1 ( {\partial f(x+t(y-x)) \over \partial x} - { \partial f(x) \over \partial x} ) (y-x) dt | \le t\int_0^1 L \|x-y\|^2 dt = {1 \over 2} L\|x-y\|^2$.