Prove that entropy for distribution decreases under certain operation

133 Views Asked by At

I have a problem where I want to prove that for almost all distributions, applying a transform on the distribution which "favours" large probabilites will result in a lower entropy distribution. It feels like it is really obvious, but I'm having a hard time formalizing a proof. I've tried using the concavity of the entropy, but I'm not quite sure how to take advantage of it.

More formally:

Let: $a,p\in \mathbb{R}^n$ such that $0 < a_1 < ... < a_j < ... < a_n$, $0 < p_1 < ... < p_j < ... < p_n < 1$,$\sum_j p_j = 1$

Let $H[p] = -\sum_j p_j\log(p_j)$

Now let $q_j = \frac{a_jp_j}{\sum_j a_jp_j}$.

Show that $H[q] < H[p]$ for all $p,a$ described above.

Example: $n=2$ $a_1=1/2,a_2=1$, $p_1=1/3,p_2=2/3$.(using $\log_2$)

$H[p] = (\frac{1}{3}\log(3)+\frac{2}{3}(\log(3)-\log(2)) = \log(3)-\frac{2}{3}\log(2) \approx 0.92$

$H[q] = (\frac{1}{5}\log(5)+\frac{4}{5}(\log(5)-\log(4)) = \log(5)-\frac{4}{5}\log(4) \approx 0.72 < H[p]$

EDIT:

My particular problem is in-fact a special case where i use $a_j=p_j$, but I want to have a general solution

2

There are 2 best solutions below

3
On BEST ANSWER

Consider the transformation $p_i\to\frac{p_ia_i^x}{\sum_ip_ia_i^x}$ for $x\in[0,1]$. This continuously transforms the one probability distribution into the other. If we can show that the derivative of the entropy with respect to $x$ at $x=0$ is negative, it follows that it is negative for all $x$, since we can rescale to shift any value of $x$ to $0$.

With $\frac{\mathrm d}{\mathrm dx}(f(x)\log f(x))=(1+\log f(x))f'(x)$, we have

\begin{eqnarray} && \left.\frac{\mathrm d}{\mathrm dx}\left(-\sum_i\frac{p_ia_i^x}{\sum_kp_ka_k^x}\log\frac{p_ia_i^x}{\sum_kp_ka_k^x}\right)\right|_{x=0} \\ &=& \left.-\sum_i\left(1+\log\frac{p_ia_i^x}{\sum_kp_ka_k^x}\right)\frac{p_ia_i^x}{\sum_kp_ka_k^x}\left(\log a_i-\frac{\sum_kp_ka_k^x\log a_k}{\sum_kp_ka_k^x}\right)\right|_{x=0} \\ &=& -\sum_i\left(1+\log p_i\right)p_i\left(\log a_i-\sum_kp_k\log a_k\right) \\ &=& -\left(E\left[\log p_i\log a_i\right]-E\left[\log p_i\right]E\left[\log a_i\right]\right) \\ &=&-\operatorname{Cov}(\log p_i,\log a_i)\;. \end{eqnarray}

This covariance is positive, since the $p_i$ and $a_i$ are both in ascending order. This in fact proves the more general result that the entropy is reduced as long as the logarithms of the $p_i$ and the $a_i$ are positively correlated.

3
On

I think I have a solution using some calculus, please correct me if something is wrong:

Assume we have some distribution $\sum_j p_j = 1$, where $p_j$ increasing and $p_j < 1$. Let $p_{\epsilon,i,j} = p + \epsilon(-\mathbf{1}_i+\mathbf{1}_j)$ be such that we remove "a tiny bit" from index $i < j$, and add a tiny bit to $j$.

The gradient of the entropy with regard to $\epsilon$ is then:

$\frac{\partial H[p_{\epsilon,i,j}]}{\partial\epsilon} = \log(p_i-\epsilon)-\log(p_j+\epsilon) = \log(\frac{p_i-\epsilon}{p_j+\epsilon})$

Since $p_i < p_j$ we have that the entropy must decrease!

(Inspiration: https://homes.cs.washington.edu/~jrl/teaching/cse599swi16/notes/lecture1.pdf)

Now, we can reframe the original problem as simply a composition of a finite number of such shufflings. Since each shuffling must decrease the entropy, we have that the original operation must decrease the entropy!