Entropy of probability distributions

293 Views Asked by At

Let $\mu_1, \mu_2$ be two probability distributions on a sample space $X$ and let $0 < \alpha < 1$. Define the entropy of a probability distribution $\mu$ to be $$H(\mu) = - \sum_{t \in X} \mu(t) \log(\mu(t))$$ Define a new probability distribution $\alpha \mu_1 + (1 - \alpha)\mu_2$. I am trying to show that $$ H(\alpha \mu_1 + (1- \alpha)\mu_2) \geq \alpha H(\mu_1) + (1 - \alpha) H(\mu_2)$$ i.e that the entropy is concave.

I've tried; $$ H(\alpha \mu_1 + (1- \alpha)\mu_2) = \sum_{t \in X} (\alpha \mu_1 + (1 - \alpha)\mu_2) \log (\alpha \mu_1 + (1 - \alpha)\mu_2)$$ and tried to use the fact that $\log$ is concave, so $$\log (\alpha \mu_1 + (1 - \alpha)\mu_2) \geq \alpha \log(\mu_1) + (1-\alpha)\log(\mu_2)$$ but I don't seem to be getting anywhere? Any help would be appreciated!

3

There are 3 best solutions below

0
On

Let $f(x)=-x\log x$ for $x>0$. Then $$f^{\prime\prime}(x)=-\frac 1 x<0$$ Therefore $f$ is concave. This yields the result.

0
On

What you are doing in taking $\alpha\mu_1+(1-\alpha)\mu_2$ is creating a combined measure, or in other word, you can think of it as a mixture of measures. So let's go on with the same line of thinking.

Define a the random variable $b$ taking the value $1$ with probability $\alpha$ and the value $2$ with probability $(1-\alpha)$. Now define $\mu=\mu_b$. Note that $\mu$ is a mixture (or a random measure) of $\mu_1$ and $\mu_2$ and that $\forall t\in X,\; \mu(t)=\alpha\mu_1(t)+(1-\alpha)\mu_2(t)$.

Now since conditioning reduces entropy we have $H(\mu ~|~ b)\leq H(\mu)$ which means $\alpha H(\mu_1)+ (1-\alpha)H(\mu_2)\leq H(\alpha\mu_1+(1-\alpha)\mu_2)$.

0
On

Here is an expanded version of Lafon's answer. Let's first rewrite the RHS of your inequality (the goal): $$LHS\leq RHS$$ which $$RHS=-\alpha \sum _{t\in X} \mu_1 log(\mu_1)-(1-\alpha)\sum _{t\in X}\mu_2 log(\mu_2)$$ $$ = - \sum _{t\in X} \alpha \mu_1 log(\mu_1)-\sum _{t\in X}(1-\alpha)\mu_2 log(\mu_2))$$ $$ = \sum _{t\in X}( -\alpha \mu_1 log(\mu_1)-(1-\alpha)\mu_2 log(\mu_2))$$ Now we rewrite the LHS, which is $$ LHS=\sum _{t\in X} -(\alpha\mu_1+(1-\alpha)\mu_2)log(\alpha\mu_1+(1-\alpha)\mu_2)$$ Now just look at the elements within the summation, if we can prove every single element in the LHS is smaller than that in the RHS, i.e. $$ -(\alpha\mu_1+(1-\alpha)\mu_2)log(\alpha\mu_1+(1-\alpha)\mu_2) \leq -\alpha \mu_1 log(\mu_1)-(1-\alpha)\mu_2 log(\mu_2) $$ then the sum (LHS) will also be smaller than the sum (RHS). Note that this condition is $\textit{sufficient but not necessary}$ in order for the original inequality to hold.

The next step is clear, we can use Lafon's suggestion to prove that the elements are indeed smaller $\forall t, t\in X$ by showing that the function $f(x)=-x log(x), x\in R^+$ is concave.

Proved.