Convergence to normal distribution

922 Views Asked by At

Consider the probability distribution of the simple symmetric walk. That is the random variable $X_i$ equals $c$ or $-c$ with equal probability and all $X_i$ are independent and $c\geq1$. We are interested in

$$S_n = X_1 + X_2 + \dots + X_n.$$

We know from the central limit theorem that $S_n/\sqrt{n}$ converges in distribution to the normal distribution $N(0,c^2)$.

We also know that the entropy of the normal distribution $N(0,c^2)$ is $\frac{1}{2}\ln(2\pi e c^2)$.

It is clear we can't tell derive the entropy of $S_n$ as $n \rightarrow \infty$ directly from this formula for the normal distribution. This is because the entropy of $S_n$ is invariant to $c$ but the entropy of the normal distribution is not.

The differential entropy wikipedia page gives a correction term but I can't understand how to apply it.

How exactly do you apply the correction term to $\frac{1}{2}\ln(2\pi e \sigma^2)$ in this example to get the correct entropy for $S_n$ as $n \to \infty$?

2

There are 2 best solutions below

3
On BEST ANSWER

Let $Y$ be a discrete (lattice) variable, taking values at points $y_i$, equispaced at intervals of length $\delta$. Suppose also that the distribution of $Y$, $F_Y$, converges to a continuous distribution $F_Z$ as $\delta \to 0$. Assume that $Z$ is well behaved ($f_Z$ exists, etc).

The relation between the (discrete) entropy $H_Y$ with the (differential) entropy $h_Z$ is deduced thus:

$$ H_Y=-\sum_{i} P(y=y_i) \log P(y=y_i)$$

But $P(y=y_i) \approx f_Z(y_i) \, \delta$. Then $$ H_Y\approx -\sum_{k} f_Z(k) \delta \log(f_Z(k) \delta )=\\ =-\sum_{k} f_Z(k) \delta \log(f_Z(k) ) -\sum_{k} f_Z(k) \delta \log( \delta ) \approx \\ \approx -\int f_Z(z) \log f_Z(z) dz -\log \delta \int f_Z(z) dz $$

Hence, as $\delta \to 0$ $$ H_Y \to h_Z -\log \delta$$

In our case: $S_n$ take values at points separated by intervals of length $\delta=2c/\sqrt{n}$ And, as $n\to \infty$, $S_n$ tends to a normal $Z\sim N(0,c^2)$ (and $\delta \to 0$).

Then $h_Z=\frac{1}{2}\ln(2\pi e c^2)$ and

$$H_Y\to \frac{1}{2}\ln(2\pi e c^2) -\ln \frac{2 c}{\sqrt{n}}=\frac{1}{2} \ln \frac{\pi e n}{2}$$

(entropy is in nats; if you prefer bits/shannons, just change the base of the logarithm)

This is consistent with the fact that $H_Y$ should be invariant to scaling (hence it should not depend on $c$)

4
On

Nothing really.

The entropy of a random process $\{X_1,X_2 \ldots\}$ is defined to be $H(X) = \lim_{n \to \infty} \frac{H(X_1,\ldots,X_n)}{n}$. (Cover and Thomas, Elements of Information Theory, 2e). For stationary random processes, this coincides with the so called entropy rate (also defined in the former reference).

In this case, the random process is bijectively given by $\{S_n\}$ and $\{X_n\}$ so $H(S_1,\ldots,S_n) = H(X_1,\ldots,X_n)$. The entropy of $(X_1,\ldots,X_n)$ is $n$. Thus, the entropy of the process $\{X_1,\ldots\}$ is the same as the entropy of the process $\{S_1,\ldots\}$ and is $1$.