Convex conjugacy and differentiating under the "sup" operator

79 Views Asked by At

I am working through example 3.10 (page 71) of a book on variational inference in statistics and the authors discuss taking the gradient of the following function with respect to $\mu$: $$ A^*(\mu) = \sup_{\theta \in \mathbb{R}} \{\, \theta \cdot \mu - \log(1 + \exp(\theta))\,\}. $$

They state that the stationary condition $\nabla_\mu A^*(\mu) = 0$ is attained at the location $\mu = \exp(\theta)/(1+\exp(\theta))$.

I'm wondering how to derive this since $\nabla A^*(\mu)$ seems to require pushing the gradient operator through the "sup" operator which has no closed form and which may not even be differentiable!

Any guidance or trick for how this stationary condition is derived is very much appreciated

2

There are 2 best solutions below

1
On BEST ANSWER

I think you might be confusing something here. The authors construct the conjugate function of $A(\theta)$, noted as $A^*(\mu)$. In order to do so, they solve the maximization problem $(3.47)$, which is done by setting $\nabla_{\theta}A=0$. They then analyze the solution of $\nabla_{\theta}A=0$ for different values of of $\mu$, which they denote as $\theta(\mu)$.

They arrive to the conclusion that $A^*(\mu)=\mu\log\mu+(1-\mu)\log(1-\mu)$ over the domain $\mu\in[0,1]$. When $\mu<0$ or $\mu>1$ the original conjugate maximization problem is unbounded, therefore $\mu$ is restricted to that line interval.

4
On

Let $A(\theta)=\log(1+\exp(\theta))$. Then, $A$ is a proper, continuous and convex function.

In this case, it holds $(\partial A)^{-1} = \partial A^*$, where $\partial A$ stands for the subdifferential of $A$ (see Corollary 16.30 in "Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edition"). By using this formula, we obtain $\nabla A^*(\mu)= (\nabla A)^{-1}(\mu)$.