Conjugate of negative normalized entropy

1.8k Views Asked by At

Given$$f(x) = \sum_{i=1}^n x_i \log \left( \frac{x_i}{\sum_{j=1}^n x_j} \right), \qquad dom f = R^n_{++}$$ Find $f^*(y)$


$$f^*(y) = \sup_x ( y^T x - f(x) ) =$$ $$ \sup_x \left( y^T x - \sum_{i=1}^n x_i \log \left( \frac{x_i}{\sum_{j=1}^n x_j} \right) \right) =$$ $$ \sup_x \sum_{i=1}^n x_i \left(y_i - \log \left( \frac{x_i}{\sum_{j=1}^n x_j} \right) \right) =$$ $$ \sup_x \sum_{i=1}^n x_i \left(\log e^{y_i} - \log \left( \frac{x_i}{\sum_{j=1}^n x_j} \right) \right) =$$ $$ \sup_x \sum_{i=1}^n x_i \log \left( \frac{e^{y_i}\sum_{j=1}^n x_j}{x_i} \right)$$ Unfortunately, I can't come up what to do next. This is problem 3.41 from Convex Optimization by Boyd and Vandenberghe, and there is an answer to this problem available:

$$ \begin{equation*} f^*(y) = \begin{cases} 0 & \sum_{i=1}^n e^{y_i} \leq 1 \\ +\infty & \text{otherwise} \end{cases} \end{equation*}$$

2

There are 2 best solutions below

0
On

In case $\sum_{i=1}^n e^{y_i} > 1$, setting $x_i = r \cdot e^{y_i}$ yields $r \cdot \sum_{i=1}^n e^{y_i} \cdot \log \sum_{i=1}^n e^{y_i}$, which goes to infinity for $r \rightarrow \infty$.

Write $S(x) = \sum_{j=1}^n x_j$. The first derivatives of the $\sup$ objective are \begin{align} \frac{\partial}{\partial x_i} &\left( y^T x - x^T \log x + \sum_{i} x_i \log S(x) \right) = y_i - \log x_i + \log S(x) \ ; \end{align} the gradient being zero thus means that $\forall i: \frac{x_i}{S(x)} = e^{y_i}$. This can be fulfilled if and only if $\sum_i e^{y_i} = 1$, and then with $x_i = r \cdot e^{y_i}$ for some $r>0$. The second derivative of the objective is \begin{align} \frac{\partial^2 \left( y^T x - x^T \log x + \sum_{i} x_i \log S(x) \right)}{\partial x_i \partial x_j}(x) = -\frac{\delta_{ij}}{x_j} + \frac{1}{S(x)} \end{align} which is negative definite: For $v \in \mathbb{R}^n$, \begin{align} \sum_{ij} v_i \left(-\frac{\delta_{ij}}{x_j} + \frac{1}{S(x)}\right) v_j &= - \sum_{i}\frac{v_i^2}{x_i} + \frac{\left(\sum_{i} v_i\right)^2}{S(x)} \leq 0 \end{align} where we used Jensen's inequality on the squaring function: \begin{align} \sum_{i}\frac{v_i^2}{x_i} &= \frac{1}{S(x)}\sum_{i}\frac{v_i^2S(x)}{x_i} \\ &= \frac{1}{S(x)}\sum_{i}\frac{x_i}{S(x)}\left(\frac{v_iS(x)}{x_i}\right)^2 \geq \frac{1}{S(x)}\left(\sum_{i}\frac{x_i}{S(x)}\frac{v_i S(x)}{x_i}\right)^2 = \frac{\left(\sum_{i} v_i\right)^2}{S(x)} \ . \end{align}

Thus, in case $\sum_i e^{y_i} = 1$ we have a global maximum with value $h^*(y) = \sum_{j=1}^n r \cdot e^{y_j} y_j - \sum_{i=1}^n r \cdot e^{y_i}\log e^{y_i} = 0$.

In the last case $\sum_i e^{y_i} < 1$, there is $\hat{y} \succ y$ with $\sum_i e^{\hat{y}_i} = 1$. For any $x\succ0$ the term $x^T y - \sum_{i=1}^n x_i\log \frac{x_i}{S(X)}$ is increasing in $y_i$. Therefore, $h^*(y) \leq h^*(\hat{y}) = 0$. Again taking $x_i = r \cdot e^{y_i}$ (also possible: simply $x_i = r$), we see that $0$ is realized as the supremum with $r \rightarrow 0$.

0
On

The derivation is straightforward if you break it down in small steps. The first step is decomposing $f$:

$$f(x) = \left( 1^Tx \right) g\left( \frac{x}{1^Tx}\right) \text{ with } g(x)=\sum_{i=1}^n g_i(x) \text{ and } g_i(x) = x_i \log(x_i).$$

We start by using that the conjugate of $x\log x$ is $e^{v-1}$, so: $$g_i^*(v) = \begin{cases} e^{v_i-1} & \text{if } v_j =0 \;\forall j \neq i \\ \infty & \text{otherwise.} \end{cases}$$ Therefore, $g^*(v) = \sum_{i=1}^n e^{v_i-1}$. The largest step is computing the conjugate of the perspective: \begin{align} f^*(v) &= \sup_{x} \left\{x^Tv- \left(1^Tx\right) g\left( \frac{x}{1^Tx}\right)\right\} \\ &= \sup_{x,y\geq 0} \inf_z \left\{x^Tv- y g\left( \frac{x}{y}\right) + z(y-1^Tx)\right\} \\ &= \inf_z \sup_{x,y\geq 0} \left\{x^Tv- y g\left( \frac{x}{y}\right) + z(y-1^Tx)\right\} \\ &= \inf_z \sup_{y\geq 0} \left\{zy + y \sup_x \left\{(x/y)^T(v-z1)- g\left( \frac{x}{y}\right)\right\} \right\} \\ &= \inf_z \sup_{y\geq 0} \left\{zy + y g^*(v-z1) \right\} \\ &= \inf_z \sup_{y\geq 0} \left\{ zy + y \sum_{i=1}^n e^{v_i-z-1} \right\} \\ &= \inf_z \begin{cases}0 & \text{if } \sum_{i=1}^n e^{v_i-z-1} + z \leq 0 \\ \infty & \text{otherwise.}\end{cases} \end{align} The last step is to find when there exists a $z$ such that $h(z) := \sum_{i=1}^n e^{v_i-z-1} + z \leq 0$, which is equivalent to solving $\min_z h(z) \leq 0$. Setting the derivative of $h$ to $0$ we obtain $\sum_{i=1}^n e^{v_i-z^*-1} = 1$ or $z^* = \log(\sum_{i=1}^n e^{v_i})-1$, so $f^*(v) = 0$ if $h(z^*)=\log(\sum_{i=1}^n e^{v_i}) \leq 0$.