Prove that logSumExp is the smooth maximum regularized by entropy

382 Views Asked by At

enter image description here

How is the last assertion proved? Why is the smooth maximum regularized by the entropy term equal to logSumExp?

1

There are 1 best solutions below

1
On

I will go through the proof step-by-step, but it is interesting to first note that the first equation you wrote is the definition of the convex conjugate of the function $\Omega$. The result might then seem less surprising since a well-known result is that the log-sum-exp function is the convex conjugate of the negative of the entropy.

In the following, I will suppose $\gamma>0$. Let $$G(\textbf{q})=\langle\textbf{q}, \textbf{x}\rangle-\Omega(\textbf{q}) = \sum_{i=1}^{n}q_ix_i-\gamma\sum_{i=1}^{n}q_i\log (q_i).$$

We are going to find the max by computing the gradient and setting it to $0$. We have

$$\frac{\partial G}{\partial q_i} = x_i -\gamma(\log(q_i)+1)$$ and $$\frac{\partial G}{\partial q_i\partial q_j}=\begin{cases}-\frac{\gamma}{q_i},\quad\text{if }i=j\\0,\quad\text{otherwise.}\end{cases}$$ This last equation states that the Hessian matrix is negative-definite (since it is diagonal and $-\frac{\gamma}{q_i}<0$), and thus ensures that the stationary point we compute is actually the maximum. Setting the gradient to $\textbf{0}$ yields $q_i^* = \exp\left(\frac{x_i}{\gamma}-1\right)$, however the resulting $\textbf{q}^*$ might not be a probability distribution. To ensure $\sum_{i=1}^nq_i^*=1$, we add a normalization: $$q_i^* = \frac{\exp\left(\frac{x_i}{\gamma}-1\right)}{\sum_{j=1}^n\exp\left(\frac{x_j}{\gamma}-1\right)}=\frac{\exp\left(\frac{x_i}{\gamma}\right)}{\sum_{j=1}^n\exp\left(\frac{x_j}{\gamma}\right)}.$$

This new $\textbf{q}^*$ is still a stationary point and belongs to the probability simplex, so it must be the maximum. Hence, you get $$\mathrm{max}_{-\gamma H}(\textbf{x}) = G(\textbf{q}^*) = \sum_{i=1}^{n}\frac{\exp\left(\frac{x_i}{\gamma}\right)}{\sum_{j=1}^n\exp\left(\frac{x_j}{\gamma}\right)}x_i-\gamma\sum_{i=1}^{n}\frac{\exp\left(\frac{x_i}{\gamma}\right)}{\sum_{j=1}^n\exp\left(\frac{x_j}{\gamma}\right)}\left(\frac{x_i}{\gamma}-\log \sum_{i=1}^n\exp\left(\frac{x_j}{\gamma}\right)\right)\\=\gamma\log \sum_{i=1}^n\exp\left(\frac{x_j}{\gamma}\right),$$

as desired.