The implication of $H(X_1,X_2,\dots,X_n) = nH(X_1)$

75 Views Asked by At

Let $X_1,X_2,\dots ,X_n$ be a discrete random process and $$a_n=\frac{1}{n}H(X_1,X_2,\dots ,X_n)$$Suppose further that we have for all $n\in \mathbb{N}$ $$a_n = a_{n+1}$$ I think this implies that $X_1,X_2,\dots,X_n$ is an IID random process but I don't know how to prove it. Obviously, the equality condition implies for all $n\in \mathbb{N}$ $$H(X_1,X_2,\dots,X_n) = nH(X_1)$$ Does this last equality imply that entropies are equal? I.e. $$H(X_1) = H(X_2)=\dots=H(X_n)$$ For example for $n=2$ we have $$H(X_1,X_2)\le H(X_1)+H(X_2) \implies 2H(X_1)\le H(X_1)+H(X_2) \implies H(X_1)\le H(X_2)$$ The pattern doesn't seem to continue for $n=3$: $$H(X_1,X_2,X_3)\le H(X_1)+H(X_2)+H(X_3) \implies 3H(X_1)\le H(X_1)+H(X_2)+H(X_3) \implies 2H(X_1)\le H(X_2)+H(X_3)$$ Edit: The answer by Misha Lavrov shows that in general the random process isn't IID. If we add the assumption that $X_1,X_2,\dots ,X_n$ is a stationary process, can we reach the conclusion that RVs are independent and hence IID? By stationary I mean $$\text{Pr}\{X_1 = x_1, X_2 = x_2,...,X_n = x_n\} = \text{Pr}\{X_{1+l} = x_1, X_{2+l} = x_2,...,X_{n+l} = x_n\}$$ for all integer $l$ and $n$.

1

There are 1 best solutions below

8
On BEST ANSWER

In general, $X_1, \dots, X_n$ don't have to be either independent or identically distributed.

First of all, even if we did have $H(X_1) = H(X_2) = \dots = H(X_n)$, that doesn't tell us anything definite about the distributions of $X_1, \dots, X_n$. We can find lots of examples of different distributions with the same entropy.

Second, we do not know that the entropies of $X_1, \dots, X_n$ are equal; we only know that $H(X_1)$ is equal to $H(X_2 \mid X_1)$ which is equal to $H(X_3 \mid X_1,X_2)$ and so on; in general, $$H(X_1) = H(X_n \mid X_1, \dots, X_{n-1}).$$ But the unconditional entropy of $X_n$ can be very different. For example, suppose that $Y_1, Y_2, \dots$ is a sequence of independent uniform samples from $\{0,1\}$, and for all $n$, $X_n = Y_1 + Y_2 + \dots + Y_n$. Then $$H(X_1, \dots, X_n) = H(Y_1, \dots, Y_n) = n$$ (in bits) because either of the $n$-tuples $(X_1, \dots, X_n)$ and $(Y_1, \dots, Y_n)$ determines the other. However, the unconditional $H(X_n)$ keeps growing: the distribution of $X_n$ is $\text{Binomial}(n,\frac12)$ and so $H(X_n)$ grows roughly as $\frac12 \log_2 n$.

(The entropy $H(X_n)$ also does not have to keep increasing with $n$. For example, we could follow the pattern above for $X_1, \dots, X_{99}$ but then decide that $X_{100} = Y_{100}$, for a sudden drop in entropy.)

Intuitively, the condition in the question only says that the total entropy in the system increases linearly: nothing more and nothing less. The random variable $X_n$ only uses the same new amount of randomness, but can also depend on $X_1, \dots, X_{n-1}$ to increase its entropy.


One thing we can say is that if $X_1, \dots, X_n$ are identically distributed, then the entropy condition tells us that they are independent.

In this case, we know $H(X_1) = H(X_n)$ for all $n$, so an earlier equation becomes $$H(X_n) = H(X_n \mid X_1, \dots, X_{n-1}).$$ In general, $H(X) = H(X \mid Y)$ only if $X$ and $Y$ are independent; this can be proven by looking carefully at the definitions, but intuitively it holds because $H(X) = H(X \mid Y)$ tells us: "nothing about $X$ is learned by knowing $Y$". So in our case, $X_n$ is independent from $(X_1, \dots, X_{n-1})$ for each $n$, which means that all the random variables in our process are independent.

(In particular, for a stationary process, we know in particular that $X_1, X_2, \dots$ are identically distributed, and therefore they are IID.)