Show that if $\frac{S_n - m_n}{s_n} \xrightarrow{d} \mathcal{N}(0, 1)$ with $X_k \sim \text{Ber}(p_k)$ then $\sum_k p_k(1-p_k) = +\infty$

256 Views Asked by At

Let $(X_n)$ be a sequence of independent random variables, $X_k \sim \text{Ber}(p_k) \ \forall k \ge 1$. Set $S_n = \sum_{k = 1}^n X_k, m_n = \sum_{k = 1}^n p_k, s_n^2 = \sum_{k = 1}^n p_k(1-p_k)$. Show that $$ \dfrac{S_n -m_n}{s_n} \xrightarrow{d} \mathcal{N}(0, 1) \Longleftrightarrow \sum_{k = 1}^{\infty} p_k(1-p_k) = +\infty $$

I'm able to show the $\Leftarrow$ direction by checking the Linderberg conditions. However, I'm getting stuck at the $\Rightarrow$ direction, and I don't know any possible directions to show this, so any hints are appreciated, thank you.

Update: The Lindeberg conditions that I'm using is in Allan Gut's "Probability: A Graduate Course" book. In this book, the author state the following Lindeberg-Levy-Feller theorem:

Theorem. Let $X_1, X_2, \ldots$ be independent random variables with finite variances, and set, for $k \ge 1, \mathbb{E}(X_k) = \mu_k, \text{Var}(X_k) = \sigma_k^2$ and, for $n \ge 1$, $S_n = \sum_{k = 1}^n X_k, s_n^2 = \sum_{k = 1}^n \sigma_k^2$. The Lindeberg conditions requires: $$ L_1(n) = \max_{1 \le k \le n} \dfrac{\sigma_k^2}{s_n^2} \rightarrow 0 \text{ as } n \rightarrow \infty \tag{1} $$ and $$ L_2(n) = \dfrac{1}{s_n^2}\sum_{k = 1}^n \mathbb{E}[\vert X_k - \mu_k \vert^2 1\{\vert X_k - \mu_k \vert > \epsilon s_n\}] \rightarrow 0 \text{ as } n \rightarrow \infty\tag2 $$ Then: (i) If $(2)$ is satisfied, then so is $(1)$ and $$ \dfrac{1}{s_n}\sum_{k = 1}^n (X_k - \mu_k) \xrightarrow{d} \mathcal{N}(0, 1) \tag3 $$ (ii) If $(1)$ and $(3)$ are satisfied, so is $(2)$.

2

There are 2 best solutions below

3
On BEST ANSWER

Suppose not. Then $s_n\to s$ for some $s$ and $Z_n := \sum_{k=1}^n(X_k - p_k) \leadsto N(0, s^2)$. Let $Z_{-1,n} = \sum_{k=2}^n(X_k - p_k)$ so that $(X_1 - p_1) + Z_{-1,n} \leadsto N(0, s^2)$. Because $Z_{-1,n}$ must converge in distribution to some $Z_{-1}$ and, independently, $X_1 - p_1$ converges in distribution to $X_1 - p_1$, we have that $(X_1 - p_1) + Z_{-1}$ is normal. By Cramer's decomposition theorem, these two variables must be normal. This contradicts that $X_1$ is Bernoulli.

0
On

First see that if $s_{n}^{2}$ converges, then as I said in the comments, $$||S_{n}-m_{n}-S_{k}+m_{k}||_{L^{2}}^{2}=Var(S_{n}-S_{k})=|s_{n}^{2}-s_{k}^{2}|$$ which tends to $0$ as $n,k$ tend to infinity as $s_{n}^{2}$ is a Cauchy sequence.

Hence $S_{n}-m_{n}$ is an $L^{2}$ Cauchy sequence and hence converges in L^{2} as well as almost surely. (If you don't understand this jump, then take it as a fact that for sums of independent variates, being cauchy in probability is equivalent to being cauchy almost surely due to a theorem by Levy). Otherwise, just consider a subsequence (which I do not relabel) $S_{n}-m_{n}$ along which you have almost sure convergence. And let $S_{n}-m_{n}\to X$, a random variable.

Now consider the case that $m_{n}$ is convergent also (say to $m$).

Then $S_{n}-m_{n}\to X\implies S_{n}\to X+m$ . But, as $m_{n}$ converges, by Borel Cantelli lemma, only finitely many $X_{k}$'s will be $1$ almost surely.

Thus almost surely $X+m$ will be an integer valued random variable. Now this is not possible if $X$ was a Normal variate (i.e. if $X$ was $N(0,\lim_{n\to\infty} s_{n}^{2})$ distributed.

Now the only subcase is when $s_{n}^{2}$ converges but $m_{n}$ does not . I think an elementary real analysis argument will yield that $\lim_{n\to\infty}S_{n}-m_{n}$ is almost surely discrete valued random variable and hence the limit cannot possibly be Gaussian. But embarrasingly, I am currently unable to prove it. Perhaps later when my brain frees up, I'll post an update. Meanwhile, I would encourage comments and discussions of my answer. It should go something like this though. The set of values of $\lim_{n\to\infty}S_{n}-m_{n}$ should be almost surely contained in $D=\bigcup_{n\in\Bbb{N}}F_{n}\cup\{\pm\infty\}$

where $F_{k}$ is the possible set of values $S_{k}-m_{k}$ can take (and $F_{k}$ has cardinality $2^{k}$). So almost surely, the limiting variate should take values in $D$ which is not possible as for a normal distribution $\mathbb{P}_{N}$, we must have $\mathbb{P}_{N}(D)=0$ because $\bigcup_{n\in\Bbb{N}}F_{n}$ is countable and hence $\mathbb{P}_{N}(\bigcup_{n\in\Bbb{N}}F_{n})=0$ and $\mathbb{P}_{N}(\{\pm\infty\})=0$. So the limit cannot possibly be Normally distributed.

As for the other direction, as I said in the comments, it is often better and wiser to try and show Liapunov's condition as Lindeberg's condition is not one which is easy to verify.

To do that just notice that $E\bigg((X_{k}-p_{k})^{4}\bigg)=p_{k}(1-p_{k})^{4} + (1-p_{k})p_{k}^{4})\leq 2\cdot p_{k}(1-p_{k})$

Thus $$\frac{1}{s_{n}^{4}}\sum_{k=1}^{n}E\bigg((X_{k}-p_{k})^{4}\bigg)\leq \frac{s_{n}^{2}}{s_{n}^{4}}\to 0$$ if $s_{n}^{2}\to\infty$ which verifies Liapunov's criteria as hence also verifies Lindeberg's condition.