I'm trying to understand the following paragraph from Boyd & Vandenberghe, page 488:
(...) we assume that the Hessian of $f$ is Lipschitz continuous on $S$ with constant $L$, i.e., $$ \| \nabla^{2}f(x) - \nabla^{2} f(y) \|_{2} \leq L \| x-y \|_{2} $$ for all $x, y \in S$. The coefficient $L$, which can be interpreted as a bound on the third derivative of $f$, can be taken to be zero for a quadratic function. More generally $L$ measures how well $f$ can be approximated by a quadratic model (...)
What exactly is the reason for stating a bound on the third derivative this way, rather than, say $$ \| \nabla^{3} f(x) \|_{2} \leq M < \infty $$ for all $x \in S$? Are these two statements somehow identical, or does one imply the other? What is (if any) the relationship between $L$ and $M$ here?
Since the third derivative of the quadratic function is zero, I expect the reason for stating a bound on the third derivative of $f$ this way is simply to support an interpretation of $L$ as a measure how well $f$ can be approximated by a quadratic model, because when $f$ is already quadratic, $L$ can be taken to be zero.
I expect that the relationship between $L$ and $M$ here is that if $\nabla^{3} f(x)$ exists then we can take $L=M$. There is the following real analysis intuition for this. A differentiable function $g:\Bbb R\to\Bbb R$ is $L$-Lipschitz iff $|g’(x)|\le L$ for each $x\in\Bbb R$. Indeed, Implication $(\Rightarrow)$ follows from the definition of a derivative, Implication $(\Leftarrow)$ follows from Lagrange’s theorem, stating that for all real $x<y$ there exists $z\in (x,y)$ such that $g(y)-g(x)=g’(z)(y-x)$.