For a twice differentiable univariate function $f(x_1)$, the intuition why $f$ is convex if and only if $f''(x_1) \ge 0$ for all $x_1$ is pretty clear. At least once one has understood the notion of "concave up", i.e., the geometric meaning of the second derivative in terms of the sign/change in slopes as it nicely described e.g., here.
Going now to multivariate functions $f(x_1, x_2, \dots, x_n)$, the "straightforward" lifting of this intuition would be that all the second partial derivatives (i.e., the Hessian matrix) would have to be non-negative for all $x_1, x_2, \dots, x_n$ in order to have f be convex. I would then conclude that once more all slopes would encode a "concave-up" scenario. However, as is well known, the true condition for the convexity of $f$ is that the Hessian matrix need be positive semidefinite (which in particular implies that some of the partial derivatives may well be negative).
I am hunting for explanations why this is so / why the straightforward intuition fails, in a very similar way as in this question. The answer there is very nice, however I would very much appreciate one that does not parameterize the function. Especially not in a way, such that the function truly depends only on one variable when setting the parameter to zero. Rather, I would like an explanation that only uses the original function.
To make it more concrete and simpler:
Isn't there a "simple" argumentation for say bivariate quadratic functions $f(x_1, x_2) = x^T Q x + c^Tx$ (with $x = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$ and $Q = \begin{bmatrix} q_{11} & q_{12} \\ q_{21} & q_{22}\end{bmatrix}$ symmetric)?
As is nicely explained here, the Hessian is then just $2Q$. Now, knowing that $f$ is convex if and only if $Q$ is positive-semidefinite is fine, but certainly no explanation. The question is now why is this the right / necessary criterion and what is its geometric interpretation?
How can I relate the $u$ in the condition $u^T Q u \ge 0$ for all $u \in \mathbb{R}^n$ to the convexity of $f$? I hope that if I understand this, then I also know why it would not be sufficient if the Hessian was only non-negative.
Thank you!
Let $\Omega \subset \mathbb{R}^n$ be open and convex. A $C^1$ function $f \colon \Omega \to \mathbb{R}$ is convex if and only if the tangent plane to the graph of $f$ at any point $x \in \mathbb{R}^n$ lies below the graph of $f$. This means that at any $x \in \Omega$, for any $y \in \Omega$, $$f(y) \geq f(x) + Df(x)(y - x).$$ In order to use the Hessian, let's assume $f$ is $C^2$. Since $f$ is $C^2$ on a convex open set we can apply Taylor's theorem: $$f(y) = f(x) + Df(x)(y - x) + \frac{1}{2}(y - x) \cdot H(x + \theta(y - x))(y - x),$$ where $\theta \in (0, 1)$. Positive semidefiniteness of the hessian $H$ at every point ensures that $$(y - x) \cdot H(x + \theta(y - x))(y - x) \geq 0,$$ and therefore that $f$ is convex. Conversely, if $H(x)$ is not positive semidefinite at some point $x \in \Omega$, then there exists $v \in \mathbb{R}^n$ such that $v \cdot H(x)v < 0$. By continuity of $H$, $v \cdot H(y)v < 0$ for $y$ near $x$. Consequently, for $h \in \mathbb{R}$ small, \begin{align} f(x + hv) &= f(x) + Df(x)hv + \frac{1}{2}hv \cdot H(x + \theta hv)hv \\ &= f(x) + Df(x)hv + \frac{1}{2}h^2v \cdot H(x + \theta hv)v \\ &< f(x) + Df(x)hv, \end{align} which implies that $f$ is not convex.