In Goodfellow et al.'s Deep Learning, the authors write on page 83:
Suppose we have a quadratic function (many functions that arise in practice are not quadratic but can be approximated well as quadratic, at least locally). If such a function has a second derivative of zero, then there is no curvature. It is a perfectly flat line, and its value can be predicted using only the gradient
I'm confused by this statement because the second derivative of a quadratic function must be a nonzero constant. What did the authors mean by this statement? Also, what is the "it" they are referring to in "it is a perfectly flat line"? If it is the quadratic function, how can its value be predicted using only the gradient when the curvature is nonzero?
A general quadratic function is $$f: \mathbb{R}^n \to \mathbb{R}, \qquad f(x) = x^T M x + v^T x + a$$ for some matrix $M \in \mathbb{R}^{n \times n}, v \in \mathbb{R}^n, a \in \mathbb{R}$. The gradient is $\nabla f(x) = (M+M^T) x + v$ and the (matrix of) second derivatives is $\nabla \nabla f(x)=\nabla \nabla f(0) = M+M^T$.
If $\nabla \nabla f(0) = 0$ then $f$ was indeed linear and $f(0),\nabla f(0)$ is enough to compute $f(x)$.