The original paper, Generative Adversarial Nets (Goodfellow & al., $2014$), states in section $4.2$:
The subderivatives of a supremum of convex functions include the derivative of the function at the point where the maximum is attained.
I am not able to understand this statement. Can someone please point me to a mathematical proof or representation behind this concept?
Let
$f(x)=\sup_{\alpha} f_{\alpha}(x)$
where the functions $f_{\alpha}$ are convex on some convex domain $D$. It's a standard theorem that $f$ is convex on $D$.
Suppose that at a particular point $x$ in $D$,
$\beta= \arg \sup_{\alpha} f_{\alpha}(x)$.
Then
$f(x)=f_{\beta}(x)$.
Let $g$ be any subgradient of $f_{\beta}(x)$. That is, $g \in \partial f_{\beta}(x)$. By the definition of the subdifferential,
$f_{\beta}(y) \geq f_{\beta}(x) + g^{T}(y-x)$
for all $y$ in $D$.
Since $f(y) \geq f_{\beta}(y)$ for all $y$ in $D$,
$f(y) \geq f(x) + g^{T}(y-x)$
for all $y$ in $D$. Thus $g \in \partial f(x)$.
Since this holds for any subgradient $g$ in $\partial f_{\beta}(x)$, $\partial f_{\beta}(x) \subseteq \partial f(x)$.
Note that the authors of this paper have used somewhat inconsistent notation- they write that $\partial f_{\beta}(x) \in \partial f(x)$, but in fact the subdifferential of $f_{\beta}$ is a subset of the subdifferential of $f(x)$.