Prove that logSumExp is the smooth maximum regularized by entropy

382 Views Asked by Bumbble Comm At 28 Mar 2026 - 10:17

source

How is the last assertion proved? Why is the smooth maximum regularized by the entropy term equal to logSumExp?

There are 1 best solutions below

Bumbble Comm On 29 Nov 2022 - 3:48

I will go through the proof step-by-step, but it is interesting to first note that the first equation you wrote is the definition of the convex conjugate of the function $\Omega$. The result might then seem less surprising since a well-known result is that the log-sum-exp function is the convex conjugate of the negative of the entropy.

In the following, I will suppose $\gamma>0$. Let $$G(\textbf{q})=\langle\textbf{q}, \textbf{x}\rangle-\Omega(\textbf{q}) = \sum_{i=1}^{n}q_ix_i-\gamma\sum_{i=1}^{n}q_i\log (q_i).$$

We are going to find the max by computing the gradient and setting it to $0$. We have

$$\frac{\partial G}{\partial q_i} = x_i -\gamma(\log(q_i)+1)$$ and $$\frac{\partial G}{\partial q_i\partial q_j}=\begin{cases}-\frac{\gamma}{q_i},\quad\text{if }i=j\\0,\quad\text{otherwise.}\end{cases}$$ This last equation states that the Hessian matrix is negative-definite (since it is diagonal and $-\frac{\gamma}{q_i}<0$), and thus ensures that the stationary point we compute is actually the maximum. Setting the gradient to $\textbf{0}$ yields $q_i^* = \exp\left(\frac{x_i}{\gamma}-1\right)$, however the resulting $\textbf{q}^*$ might not be a probability distribution. To ensure $\sum_{i=1}^nq_i^*=1$, we add a normalization: $$q_i^* = \frac{\exp\left(\frac{x_i}{\gamma}-1\right)}{\sum_{j=1}^n\exp\left(\frac{x_j}{\gamma}-1\right)}=\frac{\exp\left(\frac{x_i}{\gamma}\right)}{\sum_{j=1}^n\exp\left(\frac{x_j}{\gamma}\right)}.$$

This new $\textbf{q}^*$ is still a stationary point and belongs to the probability simplex, so it must be the maximum. Hence, you get $$\mathrm{max}_{-\gamma H}(\textbf{x}) = G(\textbf{q}^*) = \sum_{i=1}^{n}\frac{\exp\left(\frac{x_i}{\gamma}\right)}{\sum_{j=1}^n\exp\left(\frac{x_j}{\gamma}\right)}x_i-\gamma\sum_{i=1}^{n}\frac{\exp\left(\frac{x_i}{\gamma}\right)}{\sum_{j=1}^n\exp\left(\frac{x_j}{\gamma}\right)}\left(\frac{x_i}{\gamma}-\log \sum_{i=1}^n\exp\left(\frac{x_j}{\gamma}\right)\right)\\=\gamma\log \sum_{i=1}^n\exp\left(\frac{x_j}{\gamma}\right),$$

as desired.

Prove that logSumExp is the smooth maximum regularized by entropy

There are 1 best solutions below

Related Questions in LOGARITHMS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in ENTROPY

Related Questions in KARUSH-KUHN-TUCKER

Trending Questions

Popular # Hahtags

Popular Questions