Is the following proposition true? Is there a simple proof?
Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ be a twice continuously differentiable function. If there exists $M$ such that $||\nabla^2 f(x)||_2 \leq M$ for all $x$ then $$ \forall x, y \in \mathbb{R}^n: \quad ||\nabla f(x) - \nabla f(y)||_2 \leq M ||x - y||_2 $$
Update
I believe I was able to prove it. Please tell me if I am wrong here, and where. First, we remember that any real symmetric matrix $H$ is diagonalizable $H = P^T \Lambda P$, and therefore: $$ u^T H u = (P u)^T \Lambda (P u) \leq \lambda_H ||P u||^2 = \lambda_H ||u||^2 $$ where $$\lambda_H = \max_{i = 1, \dots, n} \{ |\lambda_1(H)|, \dots, |\lambda_n(H)| \}$$ Also, since $||H||_2^2 = \lambda_{max}(H^T H)$ and we know that $H^T H = P^T \Lambda^2 P$, then we also know that $||H||_2 = \lambda_H$ and therefore: $$ u^T H u \leq ||H||_2 ||u||^2 $$
Now, using the multivariate first order Taylor approximation, we have: $$ \begin{aligned} f(x) - f(y) = \nabla f(y)^T (x - y) + 0.5 (y - z)^T \nabla^2 f(z) (y - z) \\ f(y) - f(x) = -\nabla f(x)^T (x - y) + 0.5 (x - w)^T \nabla^2 f(w) (x - w) \end{aligned} $$ where $z \in [x, y]$ and $w \in [x, y]$. Adding both equations, and re-arranging, we have: $$ \begin{aligned} (\nabla f(y) - \nabla f(x))^T(x - y) &= -0.5 (y - z)^T \nabla^2 f(z) (y - z) - 0.5 (x - w)^T \nabla^2 f(w) (x - w) \\ &\geq -0.5 ||\nabla^2 f(z)||_2 ||y - z||^2 - 0.5 ||\nabla^2 f(w)||_2 ||x - w||^2 \\ &\geq -0.5 M ||y - z||^2 - 0.5 M ||x - w||^2 \\ &\geq -M ||x - y||^2 \end{aligned} $$ Using Cauchy-Schwartz we also obtain $$ - ||\nabla f(y) - \nabla f(x)|| \cdot ||y - x|| \geq (\nabla f(y) - \nabla f(x))^T(x - y) \geq -M ||x - y||^2 $$ Dividing both sides by $-||y - x||$ gives the desired result.
Let $X=\nabla^2f(x)$ be the Hessian (this matrix is real symmetric), then
$$||X||_2^2 = \lambda_{max}(X'X)=\lambda_{max}(X^2)=\max\{|\lambda|^2 : \lambda \mbox{ is eigenvalue of } X\}$$
so we have that the bound on the 2-norm of the hessian is equivalent to the bound on the maximum eigenvalue (in absolute value), this is precisely the operator norm i.e.
$$ \lVert Xu \rVert \leq \lVert X \rVert \lVert u \rVert $$
Now consider the function $h_a(t)=a^T \nabla f(x+t(y-x))$. This function satisfies the conditions for the mean value theorem to hold so there exists $t_a \in (0, 1)$ such that
$$ a^T(\nabla f(y) - \nabla f(x)) = \dfrac{h(1)-h(0)}{1-0}=h'(t_a)= a^T \nabla^2 f(x+t_a(y-x))(y-x) = a^T\nabla f^2(z_a)(y-x) $$
And now taking the supremum over $\lVert a \rVert = 1$ on both sides
$$ \lVert \nabla f(y) - \nabla f(x) \rVert \leq \sup_{\lVert a \rVert = 1 } a^T(\nabla f(y) - \nabla f(x)) = \sup_{\lVert a \rVert = 1 } a^T\nabla f^2(z_a)(y-x) \leq M \lVert y - x \rVert $$
where the first inequality comes from taking a particular value for $a$, and the inequality on the right is an application of Cauchy-Schwarz and the hypothesis.