If a function defined on $\mathbb{R}^n$ is
- infinitely differentiable,
- convex, i.e., $f\left( t x_1 + (1-t) x_2 \right) \le t f(x_1) + (1-t) f(x_2)$, and
- bounded below,
can we assert that its minima, if exist, form a connected set?
Background
I encounter this question when dealing with the maximum likelihood estimation of probabilistic graphical models. The negative log-likelihood is bounded below by zero (since probabilities are bounded above), and its Hessian is positive semi-definite (but not positive definite). By intuition, I guess its minima form a connected set, thus optimization of the loss function will give a good estimation of model parameters in principle, if the true model is not pathological.
Here is a specific example.
All variables are real. The probability of an observed data point $s$ is given by $$ P(s | h, J) = \frac{1}{Z(h,J)} \exp\left( h^T s + \frac{1}{2} s^T J s \right) = \frac{1}{Z(h,J)} \exp\left( \sum_{i=1}^{N} h_i s_i + \sum_{i=1}^N \sum_{j>i}^N J_{ij} s_i s_j \right). $$ Here $s = (s_1, \cdots, s_N)^T$ and $s_i = \pm 1$, $N$ is a given integer, $h=(h_1, \cdots, h_N)^T$ is real, and $J$ is a $N \times N$ real symmetric matrix whose diagonal element are zero, i.e., $$ J= \begin{pmatrix} 0 & J_{12} & \cdots & J_{1N} \\ J_{12} & \ddots & & J_{2N} \\ \vdots & & \ddots & \vdots \\ J_{1N} & J_{2N} & \cdots & 0 \\ \end{pmatrix}. $$ The normalization factor (partition function) is therefore $$ Z(h,J) = \sum_{s_1=\pm 1} \cdots \sum_{s_N=\pm 1} \exp\left( h^T s + \frac{1}{2} s^T J s \right) . $$ (FYI, this is just the generalized Ising model in physics or the Markov random field in statistics.)
The negative log-likelihood (loss function) for the above probability is $$ L(h,J) = - h^T s - \frac{1}{2} s^T J s + \log Z(h,J) , $$ which is infinitely differentiable, convex with respect to $h$ and $J$, bounded below by zero.
Firstly, $f$ may not have any minimum. For example, if $f(x) = e^x$, $f$ is smooth, bounded from below but does not have any minimum.
Moreover, suppose $f$ has a global minimum. By translation, suppose $f \geqslant 0$ and $\min f = 0$. Suppose there exists two point $x_1\neq x_2$ such that $f(x_1)=f(x_2) = 0$. Thus by convexity, for all $t \in [0,1]$, $f((1-t)x_1+tx_2) \leqslant (1-t)f(x_1)+tf(x_2) = 0$. As $f\geqslant 0$, $f((1-t)x_1+tx_2)=0$. Consequently, the entire interval $[x_1,x_2]$ minimize $f$, and thus, the set of $x$ such that $f(x) = 0$ is a convex subset of $\mathbb{R}^n$.
Notice that the smoothness of $f$ is never used here.