I understand the definition of the definiteness of a matrix, mathematically. But I struggle to understand why it's defined that way, and intuitively how mathematicians come up with that definition that leads to so many properties.
For example, the covariance matrix is positive semidefinite, semidefinite programming has a list of nice properties, etc. I can kind of follow the mathematical deduction, but I still struggle to intuitively understand what $x^T M x \geq 0$ stands for.
One motivating application of positive definiteness is its use in classifying local minima or maxima of multivariate real functions.
This is because according to the multivariate version of Taylor's Theorem, $$f(\mathbf{x}) \approx f(\mathbf{a}) + Df(\mathbf{a})(\mathbf{x}-\mathbf{a}) + \boxed{ \frac{1}{2}(\mathbf{x}-\mathbf{a})^T H f(\mathbf{a}) (\mathbf{x}-\mathbf{a})}$$ Here $\mathbf{f}$ is a function of $n$ variables, $\mathbf{x}$ and $\mathbf{a}$ are $n$-dimensional vectors, $Df(\mathbf{a})$ is the $1 \times n$ matrix of first degree partial derivatives of $f$ evaulated at $\mathbf{a}$, and $H f(\mathbf{a})$ is the $n \times n$ matrix of second degree partial derivatives of $f$ evaluated at $\mathbf{a}$, called the Hessian. Notice the similarity of the boxed portion of the equation to $x^T M x$ in the definition of positive definiteness.
The theorem implies that if $Df(\mathbf{a}) = \mathbf{0}$ and $H f(\mathbf{a})$ is positive definite, then $f$ has a local minimum at $\mathbf{a}$, which is analogous to the single-variable theorem stating that if $f'(a) = 0$ and $f''(a) >0$, then $f$ has a local minimum at $a$. On the other hand, if $H f(\mathbf{a})$ is negative definite then $f$ has a local maximum at $\mathbf{a}$.
Added Oct 21, 2019:
Another way to look at the definition is that geometrically, $x^T Mx$ is the dot product of $x$ and $Mx$. So $x^T Mx \ge 0$ means the dot product is non-negative, hence the cosine of the angle between $x$ and $Mx$ is non-negative, hence the angle is either a right angle or acute.