Let $f : \mathbb{R}^n \rightarrow \mathbb{R}$ continuously differentiable. We assume that there exists $L > 0$ such that \begin{equation} \|∇f(x)-∇f(x')\| \leq L\|x − x'\| \qquad \forall (x,x') \in \mathbb{R}^n \times \mathbb{R}^n. \end{equation}
Show that \begin{equation} |f(x + h) − f(x) − \langle ∇f(x), h\rangle | \leq \frac{L}{2} \|h\|^2 \qquad \forall (x, h) \in \mathbb{R}^n \times \mathbb{R}^n. \end{equation}
I do not understand part of the demonstration of which here is:
As $f$ is continuously differentiable, we have from Taylor's formula at zero order with residual in integral form \begin{equation} f (x + h) = f(x) + \int^1_0 \langle ∇f(x + th), h \rangle dt. \tag{1} \end{equation}
Me, when I apply Taylor's formula at zero order with residual in integral form, we get \begin{equation} f (x + h) = f(x) + \int_x^{x+h} ∇f(t) dt. \tag{2} \end{equation} What is the integration process that allows us to leave (2) to (1) ?
By the fundamental theorem of calculus \begin{aligned} f(\mathbf{x}+\mathbf{h}) - f(\mathbf{x}) &= \int_0^1\nabla f(\mathbf{x}+t\mathbf{h})^T\mathbf{h}\,dt \\ &=\nabla f(\mathbf{x})^T\mathbf{h}+ \int_0^1\big(\nabla f(\mathbf{x}+t\mathbf{h}) - \nabla f(\mathbf{x})\big)^T\mathbf{h}\,dt \end{aligned} and therefore \begin{aligned} |f(\mathbf{x}+\mathbf{h}) - f(\mathbf{x})-\nabla f(\mathbf{x})^T\mathbf{h}| &=\Big|\int_0^1\big(\nabla f(\mathbf{x}+t\mathbf{h}) - \nabla f(\mathbf{x})\big)^T\mathbf{h}\,dt \Big|\\ &\leq \int_0^1\big|\big(\nabla f(\mathbf{x}+t\mathbf{h}) - \nabla f(\mathbf{x})\big)^T\mathbf{h}\big|\,dt \\ &\overset{\text{C.S.}}\leq \int_0^1\|\big(\nabla f(\mathbf{x}+t\mathbf{h}) - \nabla f(\mathbf{x})\big)\|\|\mathbf{h}\|\,dt \\ &\leq\int_0^1tL\|\mathbf{h}\|^2\,dt \\ &\leq \|\mathbf{h}\|^2 \end{aligned} where C.S. is Cauchy-Schwarz and the inequality following it is from the fact that $f$ is $L$-smooth.
For general knowledge, this inequality is known as the Descent Lemma, and it's very useful in convex analysis. Analysis of optimization algorithms strongly relies on the ability to quantify the change in function value when moving from some point $\mathbf{x}$ in a direction $\mathbf{h}$.