An unusual setup where sampling without replacement from an infinite sequence might lead to a central limit theorem

88 Views Asked by At

Let $a_1, ..., a_N$ denote the alternating sequence $-1, 1, -1, 1, ..., -1, 1$ (assume $N$ is even for simplicity). Let $x_1, ..., x_N$ denote a sequence obtained by random sampling without replacement from the entire alternating sequence. Consider the random variable: \begin{equation} s_N = N^{-1/2} \sum_{n=1}^N x_n a_n . \end{equation} For finite $N$, $s_N$ is a discrete random variable and when $N$ is small it is straightforward to just write the distribution down. For example, if $N=2$, then $\mathbb{P}(s_N = -2/\sqrt{2}) = 1/2$ and $\mathbb{P}(s_N = 2/\sqrt{2}) = 1/2$. As $N$ gets large, the number of possible outcomes for $s_N$ gets very large.

Interestingly, under simulation, when $N$ is large, $s_N$ starts to look an awful lot like a like a Standard Normal random variable.

Question: Does $s_N$ have a CLT as $N \rightarrow \infty$? If so, how would one go about proving this?

UPDATE: md5 method is very nice. However I've also realized that once you write down the finite sample distribution as md5 has, this is pretty much all you need to do, since that probability distribution just so happens to be a special case of the Hypergeometric distribution, which further just so happens to satisfy the assumptions in equation 7.4 in Chapter 7 of Feller's An Introduction to Probability Theory and its Applications: Volume 1 and so the Normal limit is known. See also this question here

2

There are 2 best solutions below

5
On BEST ANSWER

Here it is not too hard to explicitly compute the distribution of $s_{2n}$. By counting the number of $x_k$ that match $a_k$, we see that for all $x\in\{0,\ldots,n\}$, $$ P(\sqrt{2n}s_{2n}=2n-4x)=\frac{\binom{n}{x}\binom{n}{n-x}}{\binom{2n}{n}}. $$ Now it suffices to apply some local limit theorem. What we need to prove is a statement of the form: for all $$ x_n=\frac{n}{2}+\frac{\sqrt{n}t}{4}+o(\sqrt{n}), $$ it holds that $P(s_{2n}=x_n)/\sqrt{n}\to e^{-t^2/2\sigma^2}/\sqrt{2\pi\sigma^2}.$ This should not be too hard, using standard estimates such as $$ \binom{n}{\frac{n}{2}+\sqrt{n}t/4+o(\sqrt{n})}2^{-n}\sim\sqrt{\frac{2}{\pi n}}e^{-t^2/2}. $$

0
On

This answer is here to flesh out the detail in the method provided by md5 in the accepted answer.

As md5 notes, given the setup: \begin{equation} \mathbb{P}(s_{2n} = -2n + 4k) = \frac{\binom{n}{k} \binom{n}{n-k}}{\binom{2n}{n}} , \quad k = 0, 1, ..., n . \end{equation} A well known result is that is also straightforward to establish via Stirling's approximation is: \begin{equation} \binom{2n}{n} = 2^{2n} \frac{1}{\sqrt{\pi n}} (1 + O(1/n)) . \end{equation} Substituting this in to the above, and noting that $\binom{n}{n-k} = \binom{n}{k}$ gives: \begin{equation} \mathbb{P}(s_{2n} = -2n + 4k) = \sqrt{\pi n} \binom{n}{k} 2^{-n} \binom{n}{k} 2^{-n} (1 + O(1/n)) . \end{equation} Another well known result sometimes referred to as the Normal approximation to the binomial means that if $d = O(\sqrt{n})$, then: \begin{equation} \binom{n}{\frac{n}{2} - d} 2^{-n} = \frac{1}{\sqrt{\pi \frac{n}{2}}} e^{-2d^2 / n} + O(1/n^{3/2}) \end{equation} Setting $k = \frac{n}{2} - d$, and using this approximation, we get: \begin{equation} f(x) = \mathbb{P}(s_{2n} = -2n + 4k) = \sqrt{\pi n} \frac{2}{\pi n}e^{-4d^2 / n} (1 + O(1/n)) . \end{equation} $x$ denotes outcomes for $s_{2n}$, so $x = -2n + 4k$, which means: \begin{equation} k = \frac{x + 2n}{4} . \end{equation} Also by our definitions $d = \frac{n}{2} - k$, so subbing in for $k$: \begin{equation} d = \frac{n}{2} - \frac{x + 2n}{4} , \end{equation} which simplifies to $d = \frac{x}{4}$. Thus: \begin{equation} -\frac{4d^2}{n} = -\frac{x^2}{2(2n)} . \end{equation} Subbing in to our probability function gives: \begin{equation} f(x) = \sqrt{\pi n} \frac{2}{\pi n}e^{-x^2 / 2(2n)} (1 + O(1/n)) . \end{equation} At this point things are looking good, because our exponent term is correct for a Normal distribution with zero mean and variance $2n$, which is exactly what we want. The problem comes when we evaluate the term in front of the exponent. We want the term in front of the exponent to be: \begin{equation} \frac{1}{\sqrt{2n} \sqrt{2 \pi}} , \end{equation} because the standard deviation is $\sqrt{2n}$. But instead we have: \begin{equation} \frac{2}{\sqrt{n} \sqrt{\pi}} . \end{equation} Everything would be correct if that $2$ was in the denominator instead of the numerator, but it isn't, and I can't see where I went wrong here.