When is cdf $F_{X_1+\dots+X_n}(c)$ of sum of iid zero mean random variables decreasing in sample size $n$?

143 Views Asked by At

Let $X_1, X_2, \dots$ be a sequence of i.i.d. random variables with mean zero (e.g., $N(0,1)$). Let $n > m$ and $c \geq 0$. I want to show that $$ P\left(\sum_{i=1}^n X_i \leq c \right) \leq P \left( \sum_{i=1}^m X_i \leq c \right).$$ In view of existing concentration bounds like Hoeffding's inequlity which scale with the the length of the sequence, i.e. $n$ and $m$, I would think that the above statement should hold.

Edit: Since it was pointed out that this doesn't hold for specific cases where $n$ and $m$ are small and the distribution of $X_i$ is discrete, assume that $X_1, X_2, \dots$ are Gaussian and $n$ sufficiently large.

3

There are 3 best solutions below

3
On BEST ANSWER

From the gaussian assumption, we can calculate analytically the two probabilities, it suffices to notice that $\sum_{i=1}^nX_i\sim \sqrt{n}\mathcal{N}(0,1)$ then $$\mathbb{P}\left(\sum_{i=1}^nX_i\le c \right) = \mathbb{P}\left(\mathcal{N}(0,1)\le \frac{c}{\sqrt n} \right) = \Phi\left(\frac{c}{\sqrt n}\right)$$

As $\frac{c}{\sqrt n} <\frac{c}{\sqrt m}$ for $c \ge 0$ and $n >m$, we have $$\mathbb{P}\left(\sum_{i=1}^nX_i\le c \right)\le \mathbb{P}\left(\sum_{i=1}^mX_i\le c \right)$$

1
On

Take $X_i$ such that $X_i=-2$ with probability $\dfrac{1}{3}$ and $X_i=1$ with probability $\dfrac{2}{3}$. A direct calculation shows that $\mathbb{E}[X_i]=0$.

Take $n=2, m=1$, $c=0$.

Obviously, $$P\left(\sum_{i=1}^m X_i \leq c\right)=P(X_1\leq 0)=P(X_1=-2)=\dfrac{1}{3}$$

Now notice that $X_1+X_2>0$ iff $X_1=X_2=1$, so

$$P\left(\sum_{i=1}^n X_i \leq c\right)=1-P(X_1=1, X_2=1)=1-P(X_1=1) P(X_2=1)=1-\left(\dfrac{2}{3}\right)^2=\dfrac{5}{9}>\dfrac{1}{3}$$

0
On

Here, I show that the inequality in the OP holds for the large class of log-concave distributions, including any affine transformation of some of the following well-known distributions:

  • the normal distribution
  • the exponential distribution
  • the uniform distribution
  • the logistic distribution
  • the extreme value distribution
  • the Laplace distribution
  • the chi distribution
  • the hyperbolic secant distribution,

and many other distributions (see the link 1). The transformation needs to be such that the mean is zero, e.g.

$$X_1,X_2,\dots \sim a \, \mathcal E (\lambda) -\frac{a}{\lambda} + \mathcal U(-b,b) + \mathcal N(0,\sigma^2)+ c\, \chi^2_k -c\, k $$

for any $a,b, c \in \mathbb R, \sigma^2>0 $.

Proof

First consider the following two key properties for log-concave distributions 1:

If $X$ has a log-concave distribution $f_X$, then its cdf $F_X(x)=\mathbb P (X \le x)$ is also log-concave, that is, $\log F_X(x)$ is a concave function on its support.

The sum of two independent random variable with log-concave distribution again has a log-concave distribution.

Hence, if $X$ has a log-concave distribution, then $\log F_{X_1+\dots +X_n}(x)$ is a concave function, where $X_1,\dots, X_n \sim X$.

The inequity in the OP for $n>m$:

$$ P\left(\sum_{i=1}^n X_i \leq c \right) \leq P \left( \sum_{i=1}^m X_i \leq c \right),$$

is equivalent to:

$$\color{blue}{\log F_{\sum_{i=1}^n X_i}(c) \leq \log F_{\sum_{i=1}^m X_i}(c)}. $$

The LHS can be written as

$$\log F_{\sum_{i=1}^n X_i}(c)=\mathbb E \left [ \log F_{\sum_{i=1}^m X_i} \left (c-\sum_{i=m+1}^nX_i \right ) \right ].$$

As $\log F_{\sum_{i=m+1}^n X_i}$ is a concave function, from the Jensen inequality we have:

$$\color{blue}{\log F_{\sum_{i=1}^n X_i}(c)}=\mathbb E \left [ \log F_{\sum_{i=1}^m X_i} \left (c-\sum_{i=m+1}^nX_i \right ) \right ] \le \\ \log F_{\sum_{i=1}^m X_i} \left (\mathbb E \left [c-\sum_{i=m+1}^nX_i \right ] \right )=\color{blue}{\log F_{\sum_{i=1}^m X_i} \left (c\right )},$$

in which we used $\mathbb E \left [\sum_{i=m+1}^nX_i \right ]=0$.

Hence, the inequality not only holds for the standard normal distribution, but also holds for any other log-concave distributions with zero mean. Note that if $X,Y$ have a log-concave distribution, then for any $a,b \in \mathbb R$, the following transformation: $$a \left (X-\mathbb E(X) \right )+b \left (Y-\mathbb E(Y) \right )$$ has a log-concave distribution whose mean is zero, e.g., $$X_1, X_2, \dots \sim a \, \mathcal E (\lambda) -\frac{a}{\lambda},$$ $$X_1, X_2, \dots \sim \mathcal U (-a,a) $$ $$X_1, X_2, \dots \sim \mathcal U (-a,a)+ \mathcal N(0,\sigma^2). $$