Transformation of entropy

111 Views Asked by At

Let $X$ be a discrete random variable with values $A = \{+1,+2,+4,+8\}$, such that that entropy $H(X) = 2$ bits.

For $Y=\log_2(X)$, such that $Y\in\{+0,+1,+2,+3\}$, we find that

  • $P(Y = +0) = P(Y = +3) = 0$
  • $P(Y = +1) = P(Y = +2) = \frac{1}{2}$

Is this right?

Second question: prove that there is no such function $g$ such that $H(g(X)) > H(X)$.

I think we should start off by assuming $g$ exists and $H(g(X)) > H(X)$, then prove by absurdity that this is impossible by stating the Gibbs inequality, such that $$\sum_{i=1}^4 p_i \log_2\frac{q_i}{p_i} \leq 0$$ with $p_i$ the probabilities of $X$ and $q_i$ the probabilities of $g(X)$. I'm not sure how to make use of this inequality.

2

There are 2 best solutions below

3
On BEST ANSWER

No. First you need to show $P(X=a)=\frac{1}{4}$ for any $a=1, 2, 4, 8$. There is a simple idea behind the scene: Since there are four possible outcomes of $X$, we only need two bits to encode $X$, hence its entropy is at most $2$, and it's well-known that for discrete variables, this upper bound can only achieved by a uniform distribution.

Rigorously, $H(X)=\sum_a p_a\log_2 \frac{1}{p_a}\le \log_2(\sum_a \frac{p_a}{p_a})=\log_24=2$ by the convexity of $\log_2$, and the equality holds iff $p_a:=P(X=a)$ are all the same. To be completely rigorous, we may have to worry about $p_a=0$ for some $a$, in which case ignoring those terms and still apply the convexity, we get $H(X)\le \log_2(3)<2$.

Now $P(Y=a)=P(X=2^a)=\frac{1}{4}$.

As for the second part, intuitively, if we can transmit $X$ with $n$-bits on average, then we can transmit $g(X)$ with at most $n$-bits, since the receiver can call the function $g$ by herself to get $g(X)$ once $X$ is known , hence $H(g(X))\le H(X)$. A rigorous proof is not harder either:

$$H(g(x))=\sum_a P(g(x)=a)\log_2 \frac{1}{P(g(x)=a)}=-\sum_aP(x\in g^{-1}(a))\log_2(P(x\in g^{-1}(a)))$$

And because $\log_2$ is monotone, $$\begin{aligned} P(x\in g^{-1}(a))\log_2P(x\in g^{-1}(a)) &=(\sum_{b\in g^{-1}(a)}P(x=b))\log_2(\sum_{b\in g^{-1}(a)}P(x=b)) \\ &\ge\sum_{b}P(x=b)\log_2(P(x=b)) \end{aligned}$$

This also follows from some more advanced inequality from information theory, but it's really just elementary calculus.

2
On

For the second part: let $Y=g(X)$. We know $H(Y|X)=0$ (because knowing $X$ we know $Y$)

We also know the chain rule: $H(X,Y)=H(X)+H(Y|X)=H(Y)+H(X|Y)$

Hence, in our case $H(Y)= H(X)-H(X|Y)$. Because $H(X|Y)\ge 0$, we conclude

$$H(Y)=H(g(X)) \le H(X)$$