Smoothness (i.e. Lipschitz continuous gradient) of supremum

176 Views Asked by At

Define $$ \mathcal{P} = \{p \in \mathbb{R}^n \ | \ \sum_{i=1}^n p_{(i)}=1, \ p_{(i)} \geq 0, \ \sum_{i=1}^n \phi(p_{(i)}) \leq \rho \}, $$ where $p_{(i)}$ is the $i$'th element of the vector $p$, $\phi$ is a strictly convex function and $\rho$ is positive a scalar. Let $\Theta \subset \mathbb{R}^d$, $\mathbb{X} \subset \mathbb{R}^m$ and $l : \Theta \times \mathbb{X} \to \mathbb{R}$. Moreover, define $$ f(\theta) := \sup_{p \in \mathcal{P}} \sum_{i=1}^n p_{(i)}\ l(\theta,x_i), $$ where $x_i \in \mathbb{X}$ for $i=1,\dots,n$ and $l(\theta,x_i)$ is convex in $\theta$ for any fixed $x_i$.

Definition: A function $g :\mathbb{R}^s \supset \mathbb{Z} \to \mathbb{R}$ is $\beta$-smooth if it has $\beta$-Lipschitz continuous gradients, that is, $$ \|\nabla g(x) - \nabla g(y)\| \leq \beta\|x - y\|, $$ for all $x,y \in \mathbb{Z}$.

Question: I know that for the above definition of $f(\theta)$, it is not smooth in general. Thus, I would like to know which extra conditions would guarantee smoothness of $f(\theta)$? For example, do we need to add more constraints to the set $\mathcal{P}$? Do we need to upper bound $\rho$? Do we need to make stronger assumption on $l(\theta,x)$ and/or $\phi(p_{(i)})$?

I know the question is somewhat open ended, with probably more than one solution, but I hope it is enough in order to get something concrete. For reference, this question is related to distributionally robust optimization (see this paper).