Let $X$ be a discrete random variable with values $A = \{+1,+2,+4,+8\}$, such that that entropy $H(X) = 2$ bits.
For $Y=\log_2(X)$, such that $Y\in\{+0,+1,+2,+3\}$, we find that
- $P(Y = +0) = P(Y = +3) = 0$
- $P(Y = +1) = P(Y = +2) = \frac{1}{2}$
Is this right?
Second question: prove that there is no such function $g$ such that $H(g(X)) > H(X)$.
I think we should start off by assuming $g$ exists and $H(g(X)) > H(X)$, then prove by absurdity that this is impossible by stating the Gibbs inequality, such that $$\sum_{i=1}^4 p_i \log_2\frac{q_i}{p_i} \leq 0$$ with $p_i$ the probabilities of $X$ and $q_i$ the probabilities of $g(X)$. I'm not sure how to make use of this inequality.
No. First you need to show $P(X=a)=\frac{1}{4}$ for any $a=1, 2, 4, 8$. There is a simple idea behind the scene: Since there are four possible outcomes of $X$, we only need two bits to encode $X$, hence its entropy is at most $2$, and it's well-known that for discrete variables, this upper bound can only achieved by a uniform distribution.
Rigorously, $H(X)=\sum_a p_a\log_2 \frac{1}{p_a}\le \log_2(\sum_a \frac{p_a}{p_a})=\log_24=2$ by the convexity of $\log_2$, and the equality holds iff $p_a:=P(X=a)$ are all the same. To be completely rigorous, we may have to worry about $p_a=0$ for some $a$, in which case ignoring those terms and still apply the convexity, we get $H(X)\le \log_2(3)<2$.
Now $P(Y=a)=P(X=2^a)=\frac{1}{4}$.
As for the second part, intuitively, if we can transmit $X$ with $n$-bits on average, then we can transmit $g(X)$ with at most $n$-bits, since the receiver can call the function $g$ by herself to get $g(X)$ once $X$ is known , hence $H(g(X))\le H(X)$. A rigorous proof is not harder either:
$$H(g(x))=\sum_a P(g(x)=a)\log_2 \frac{1}{P(g(x)=a)}=-\sum_aP(x\in g^{-1}(a))\log_2(P(x\in g^{-1}(a)))$$
And because $\log_2$ is monotone, $$\begin{aligned} P(x\in g^{-1}(a))\log_2P(x\in g^{-1}(a)) &=(\sum_{b\in g^{-1}(a)}P(x=b))\log_2(\sum_{b\in g^{-1}(a)}P(x=b)) \\ &\ge\sum_{b}P(x=b)\log_2(P(x=b)) \end{aligned}$$
This also follows from some more advanced inequality from information theory, but it's really just elementary calculus.