Prove an identity of the infimum of entropy function

217 Views Asked by At

I came across the following identity (which can be seen as entropy minimization) which I would like to prove but I am not sure if my thinking is correct:

\begin{equation} \inf_{\mathbf{1}^Tz=1} \sum_{i=1}^{m}z_i\log(z_i)=\log\left( \frac{1}{m} \right) \end{equation}

To start off, I provide my current findings first. Without loss of generality, we can assume $z_1\leq\ldots\leq z_m$. Then: \begin{equation} 1=\mathbf{1}^Tz \geq mz_1 \Rightarrow z_1\leq \frac{1}{m} \end{equation} and \begin{equation} \sum_{i=1}^{m}z_i\log(z_i) \geq (z_1+\ldots+z_m)\log(z_1) \end{equation}

Here, I would like to have $\log\left(z_1\right)\geq\log\left( \frac{1}{m} \right)$ to get the desired identity when taking the infimum, but this is obviously not right since above we already showed that $z_1\leq \frac{1}{m}$.

What am I doing or thinking wrong here?

1

There are 1 best solutions below

0
On

As suggested in the comments, this minimization is related to the well-known entropy maximization problem (measure of information):

\begin{equation} \sup_{\mathbf{1}^Tz=1} f(z) = \sum_{i=1}^{m} -z_i\log(z_i) \end{equation}

where $z_1,\ldots,z_m$ can be thought of as probabilities such that the total probability adds to one (this is indeed our constraint).

For feasibility of this optimization problem, there is also the implicit constraint that $z_i>0$. Otherwise, we define $0\log(0)=0$ to represent the case when no information is conveyed.

Since $f(z)$ is (at least) twice differentiable, the first-order optimality condition gives that

\begin{equation} \frac{\partial f(z)}{\partial z_i} = -\log(z_i) - 1 = 0 \Rightarrow z_i = e^{-1} \end{equation}

which is independent of the index $i$. In other words, at the optimum the probabilities will be the same. We also note that this extreme point is a maxima since $\frac{\partial^2 f(z)}{\partial z_i^2} = -\frac{1}{z_i} = -e<0$, which means that the supremum will be bounded above.

The optimal value $z_i = e^{-1}$ can be related to $m$ as

\begin{equation} 1=\mathbf{1}^Tz= \sum_{i=1}^{m} z_i = me^{-1} \Rightarrow e^{-1}=\frac{1}{m} \Rightarrow z_i=\frac{1}{m} \end{equation}

Plugging this back into the objective function we get maximum entropy as

\begin{equation} \sum_{i=1}^{m} -\frac{1}{m}\log\left(\frac{1}{m}\right) = -\log\left(\frac{1}{m}\right) \end{equation}

or, in the case of the original problem, minimum entropy $\log\left(\frac{1}{m}\right)$.