Large numbers and CLT: confusion over the behavior of the sum of iid random variable

513 Views Asked by At

In a nutshell I am confused about the fact that the fluctuations of the sum behave as $ \sqrt n $ but the empirical mean converges (fluctuations here behave as $ \frac {1}{\sqrt n} $). Below my reasons for the confusion and a few examples that puzzle me.

In the simplest version of the law of large numbers the empirical mean of n independent and identical (iid) random variables converges in probability to the mean u of the underlying true distribution as n tends to infinity.

Now, I am tempted to say that as the sampling mean converges we can also say that the sum of n iid random variables converges to $n \cdot u$ or if we don't want to use the word 'converge' because there is still n in the limit we can say that the sum behaves as $n \cdot u$ . Is this right? Does this mean, for 1-D RV, that for $ n \rightarrow \infty $ only the sequences $ \{x_1, x_2, .., x_n\} $ whose terms sum to $n \cdot u$ survive?

In fact, using the central limit theorem, say that the RV X has finite variance, the probability distribution of the sum approximates a normal distribution with mean $ n \cdot u $ and variance $ \sigma^2 \cdot n$, which really means that fluctuations keep growing with n. This means that for example if I toss n times a fair coin it's not actually true that for n going to infinity I'll surely get n/2 times tails and n/2 time heads (because that would mean that the value of the sum was fixed). Is this right? This feels slighty strange to me, because the empirical mean will almost surely (almost surely in an informal sense here) be u and yet the sum won't be $n \cdot u$ .

In the same spirit, I can express the law of large numbers saying that the empirical probability $ \sum \delta(x - x_i)/n $ converges to $ P(x_i)$. Then consider a 1-d random walk of a particle on the x axis with step either -1 and 1 with equal probability starting from the origin. The mean squared distance will behave as n in the limit. If I say, though, that in the limit $ N \rightarrow \infty \Rightarrow n_-1 \rightarrow n/2 $ and $ n_+1 \rightarrow n/2 $, respectively the number of occurrences of step -1 and step +1, it seems obvious that for $ n \rightarrow \infty\ $ the particle will be in the origin and it's hard to understand how the mean distance could possible behave as $ \sqrt n $. This is wrong, isn't?

In general, given X RV and P(X) with X well-behaved enough for some version of the the law of large numbers and CLT, can I say that the number of occurrences $ N_x = P(x)\cdot N $ as $ N \rightarrow \infty $? I notice that people say this in many situations, I myself have done so without really thinking about it in a fair amount of occasions, and yet I have doubts now because the fluctuations in the number of occurrences grow with N.

I study physics and I'm studying a bit of serious statistics and probability as part of a statistical mechanics course (just so you know my background). In many situations in statistical mechanics the energy of the system is a sum of many independent terms and it would be a disaster if the fluctuations grew with the numbers of terms. This adds to my confusion.

Thanks in advance for any responses.

2

There are 2 best solutions below

0
On

If $X_1, X_2, \ldots, X_n$ are iid random variables with mean $\mu$ and standard deviation $\sigma$, the sum is $S_n = \sum_{j=1}^n X_j$ and the empirical mean is $M_n = S_n/n$. $S_n$ has mean $n \mu$ and standard deviation $\sqrt{n} \sigma$, so $M_n$ has mean $\mu$ and standard deviation $\sigma/\sqrt{n}$. The Central Limit Theorem says the distribution of $Z_n = \dfrac{S_n - n \mu}{\sqrt{n} \sigma} = \dfrac{\sqrt{n} (M_n - \mu)}{\sigma}$ approaches the standard normal distribution. The fluctuations in $S_n$ are on the order of $\sqrt{n}$, and the fluctuations in $M_n$ are on the order of $1/\sqrt{n}$.

In your example, if you toss a fair coin $n$ times, the numbers of heads and tails are very unlikely to be exactly $n/2$. They will differ from $n/2$ by something on the order of $\sqrt{n}$.

0
On

Robert Israel has covered most of your concerns, but I feel that this is too large for a comment.

Looking at your concern about "large fluctuations", the absolute fluctuations really do grow with $n$. Specifically, the absolute fluctuations of the sum $S_n$ are on the order of $\sqrt{n}$. But the relative fluctuations shrink. In particular the fluctuations in the mean are only on the order of $1/\sqrt{n}$.

This isn't as bad as it sounds. Choosing the mean and variance to be $1$ for illustration, with $N=10000$ the distribution is essentially confined to $(9900,10100)$, a fluctuation of $1\%$. This trend of decreasing relative fluctuations continues as you increase $N$ further.

In statistical mechanics, or more specifically statistical thermodynamics, there is a little more going on, because when we pass to the limit of many particles, we also pass to the limit of giving them plenty of space to exist. Specifically we assume something like $V=N/\rho$, where $\rho>0$ is fixed, and then send $N \to \infty$ together with $V$. This complicates matters, because the kinetic energy per particle goes down as you increase the volume. Try doing this in the particle-in-a-box system.