Role of $f(\mathbf{x}_0)$ and $\nabla f(\mathbf{x}_0)^T (\mathbf{x} - \mathbf{x}_0)$ in quadratic approximation

32 Views Asked by At

I am currently studying the textbook Algorithms for Optimization by Mikel J. Kochenderfer and Tim A. Wheeler. Appendix C.6 Positive Definiteness says the following:

The notion of a matrix being positive definite or positive semidefinite often arises in linear algebra and optimization for a variety of reasons. For example, if the matrix $A$ is positive definite in the function $f(\mathbf{x}) = \mathbf{x}^T \mathbf{A} \mathbf{x}$, then $f$ has a unique global minimum.

Recall that the quadratic approximation of a twice-differentiable function $f$ at $x_0$ is

$$f(\mathbf{x}) \approx f(\mathbf{x}_0) + \nabla f(\mathbf{x}_0)^T(\mathbf{x} − \mathbf{x}_0) + \dfrac{1}{2}(\mathbf{x} − \mathbf{x}_0)^T \mathbf{H}_0(\mathbf{x} − \mathbf{x}_0) \tag{C.31}$$

where $\mathbf{H}_0$ is the Hessian of $f$ evaluated at $\mathbf{x}_0$. Knowing that $(\mathbf{x} − \mathbf{x}_0)^T \mathbf{H}_0(\mathbf{x} − \mathbf{x}_0)$ has a unique global minimum is sufficient to determine whether the overall quadratic approximation has a unique global minimum.$^6$.

A symmetric matrix $\mathbf{A}$ is positive definite if $\mathbf{x}^T \mathbf{A} \mathbf{x}$ is positive for all points other than the origin: $\mathbf{x}^T \mathbf{A} \mathbf{x} > 0$ for all $\mathbf{x} \not= \mathbf{0}$.

A symmetric matrix $\mathbf{A}$ is positive semidefinite if $\mathbf{x}^T \mathbf{A} \mathbf{x}$ is always non-negative: $\mathbf{x}^T \mathbf{A} \mathbf{x} \ge \mathbf{0}$ for all $\mathbf{x}$.

$^6$ The component $f(\mathbf{x}_0)$ merely shifts the function vertically. The component $\nabla f(\mathbf{x}_0)^T (\mathbf{x} - \mathbf{x}_0)$ is a linear term which is dominated by the quadratic term.

I found this part interesting:

$^6$ The component $f(\mathbf{x}_0)$ merely shifts the function vertically. The component $\nabla f(\mathbf{x}_0)^T (\mathbf{x} - \mathbf{x}_0)$ is a linear term which is dominated by the quadratic term.

I was wondering if someone would please take the time to expand upon this in more detail. How does $f(\mathbf{x}_0)$ "shift" the function vertically? Why is this term, as well as the term $\nabla f(\mathbf{x}_0)^T (\mathbf{x} - \mathbf{x}_0)$, implied to be insignificant?