Is the sum of randomly chosen half of $n$ values approximately normal?

Question

Is the sum of randomly chosen half of $n$ values approximately normal?

77 Views Asked by Bumbble Comm At 01 Apr 2026 - 10:01

I have $n$ real numbers $x_1, \cdots, x_n$, and I randomly pick half of the $n$ numbers and denote by $X$ the summation of these $\frac{n}{2}$ values. I wonder if the random variable $X$ is approximately normal as $n$ grows large? My simulation indicates this but I really hope for a mathematical justification.

Just add some mild restriction on the values of $x_i$: suppose $\sum_{i=1}^{n} x_i = 0$ and $\frac{1}{n-1}\sum_{i=1}^{n} x^2_i = 1$. These are essentially normalizing and should WLOG. In this case, $\mathbb{E}(X) = 0$ and $\mathrm{Var}(X) = \frac{n}{4}$. Let us consider how close $$\frac{X-\mathbb{E}(X)}{\sqrt{\mathrm{Var}(X)}} = \frac{2X}{\sqrt{n}}$$ is to standard normal.

What I have tried: let $z_i$ be the binary indicator such that $z_i = 1$ if $x_i$ is chosen. Thus, $X = \sum_{i=1}^{n} x_i z_i$. These $z_i$ follow a standard Bernoulli distribution but are negatively correlated with correlation $-\frac{1}{n-1}$. So standard central limit theorem does not apply. But the intuition is, as $n$ grows large, the correlation becomes weaker and weaker, and thus these $z_i$ are nearly independent. It is tempting to use Stein's method to justify, but this appears too technical to me, and I wonder if any simpler tool is there.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Consider $y_i = 2^{-i}$ and $x_i = a_n(y_i - b_n)$ for all $i\geq 1$, where $a_n$ and $b_n$ are chosen to satisfy

$$ \sum_{i=1}^{n} x_i = 0 \qquad\text{and}\qquad \frac{1}{n-1}\sum_{i=1}^{n} x_i^2 = 1. $$

Now let $I$ be the random set chosen uniformly among the family of subsets of $\{1,2,\dots,n\}$ of size $n/2$. If we set

\begin{align*} X_n = \sum_{i=1}^{n} x_i\mathbf{1}_{\{i \in I\}} \qquad\text{and}\qquad Y_n = \sum_{i=1}^{n} y_i\mathbf{1}_{\{i \in I\}}, \end{align*}

then it is clear that

$$ \frac{X_n - \mathbf{E}[X_n]}{\sqrt{\mathbf{Var}(X_n)}} = \frac{Y_n - \mathbf{E}[Y_n]}{\sqrt{\mathbf{Var}(Y_n)}}, $$

and so, it suffices to study whether $Y_n$ is approximately normal.

Now for each fixed $m$, we may write

$$ Y_n = \sum_{i=1}^{m} 2^{-i} \mathbf{1}_{\{i \in I\}} + \mathcal{O}(2^{-m}). $$

Moreover, the random variables $(\mathbf{1}_{\{i \in I\}})_{i=1}^{m}$ are $\operatorname{Ber}(\frac{1}{2})$, and their joint distribution converges to that of $m$ independent $\operatorname{Ber}(\frac{1}{2})$ variables as $n\to\infty$. Using this, it is not hard to prove that the distribution of $Y_n$ converges to the uniform distribution of $[0, 1]$. Then

$$ \frac{Y_n - \mathbf{E}[Y_n]}{\sqrt{\mathbf{Var}(Y_n)}} \quad \xrightarrow[\text{in dist.}]{n\to\infty} \quad \operatorname{Uniform}(-\sqrt{3},\sqrt{3}), $$

and so, CLT does not hold in this case.

I also included the probability histogram for $10^5$ samples from this $\frac{Y_n - \mathbf{E}[Y_n]}{\sqrt{\mathbf{Var}(Y_n)}}$ with $n = 1000$:

Is the sum of randomly chosen half of $n$ values approximately normal?

There are 1 best solutions below

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in NORMAL-DISTRIBUTION

Related Questions in CENTRAL-LIMIT-THEOREM

Related Questions in PROBABILITY-LIMIT-THEOREMS

Trending Questions

Popular # Hahtags

Popular Questions