I am working through example 3.10 (page 71) of a book on variational inference in statistics and the authors discuss taking the gradient of the following function with respect to $\mu$: $$ A^*(\mu) = \sup_{\theta \in \mathbb{R}} \{\, \theta \cdot \mu - \log(1 + \exp(\theta))\,\}. $$
They state that the stationary condition $\nabla_\mu A^*(\mu) = 0$ is attained at the location $\mu = \exp(\theta)/(1+\exp(\theta))$.
I'm wondering how to derive this since $\nabla A^*(\mu)$ seems to require pushing the gradient operator through the "sup" operator which has no closed form and which may not even be differentiable!
Any guidance or trick for how this stationary condition is derived is very much appreciated
I think you might be confusing something here. The authors construct the conjugate function of $A(\theta)$, noted as $A^*(\mu)$. In order to do so, they solve the maximization problem $(3.47)$, which is done by setting $\nabla_{\theta}A=0$. They then analyze the solution of $\nabla_{\theta}A=0$ for different values of of $\mu$, which they denote as $\theta(\mu)$.
They arrive to the conclusion that $A^*(\mu)=\mu\log\mu+(1-\mu)\log(1-\mu)$ over the domain $\mu\in[0,1]$. When $\mu<0$ or $\mu>1$ the original conjugate maximization problem is unbounded, therefore $\mu$ is restricted to that line interval.