Proof of Central Limit Theorem via MaxEnt principle

283 Views Asked by At

Let $X_i$'s be i.i.d. with mean $0$ and variance $\sigma^2$.

After reading Jaynes' book: Probability the Logic of Science, I decided to try out and actually prove CLT via the following steps:

a) Group random variables like so in the sum

$$((X_1 + X_2) + X_3 + X_4)+X_5+X_6+X_7+X_8) + \dots$$

so that I only have to deal with a convolution of 2 R.V.'s, which will be defined as $\frac{X_1 + X_2}{\sqrt{2}}$ to keep $\sigma$ fixed.

b) (Jaynes) $\frac{Y + Y}{\sqrt{2}} \sim Y \implies Y$ is $\mathcal{N}(0, \sigma)$, so convolution is a forgetful operator. Hence entropy must increase in our sum.

c) Show that there is no such probability distribution $Z \neq \mathcal{N}(0, \sigma^2)$ (with mean $0$, variance $\sigma^2$), such that

$\mathbb{E} \left[ \left(\frac{Z_1 + Z_2}{\sqrt{2}} \right)^i \right] = \mathbb{E}[Z_1^i]\,, $ for some $i \geq 3$, so convolution can't preserve anything but the first $2$ moments.

d) Hence such convolution applied repeatedly must converge to the Gaussian distribution $\mathcal{N}(0, \sigma^2)$, which has a nice property that its entropy is maximised among all distributions if we keep the first two moments fixed.

However, I have problems with c). It is not difficult to show if one assumes finite moments, but how would one deal with infinite ones?

I know about truncations of R.V's and characteristic functions, but if I show c) using these tools, the proof loses its reliance on MaxEnt principle.

1

There are 1 best solutions below

1
On

It seems to be awfully similar to Banach fixed point theorem. Let $H$ stand for entropy.

Introduce a metric $d$ on all distributions with variance $\sigma^2$, mean $0$, so that $d(X, Y) = |H(X) - H(Y)|$. Introduce an operator $C(X) = \frac{X_1 + X_2}{\sqrt{2}}$ for some independent copies of $X_1, X_2$ of $X$. Claim:

$d(C(X), C(Y)) \leq q d(X, Y)$ from some $q \in [0, 1)$. Why could this be the case? I see my metric space as $\mathcal{N}(0, \sigma^2)$ (because it is special, it's entropy is maximised) representing $0$. We know the convolution brings $X$ and $Y$ closer to the $\mathcal{N}(0, \sigma^2)$, so it's akin to taking $|x-y|$ metric on $\mathbb{R}$ and f(x) = x / 2 as an operator.

Then start with any $x$ and keep applying $C$ to get to the fixed point, $\mathcal{N}(0, \sigma^2)$. I couldn't show that my mapping is a contraction mapping just now, any ideas?