Limiting distribution of binary variable (Central limit theorem fails)

312 Views Asked by At

Suppose we have a random variable $$Y_i = i \text{ with probability } \frac{1}{i}$$ and $0$ otherwise. Here all the $Y_i$ are independent. We can redefine $X_i = Y_i -1 $ so that $E(X_i)=0$. Then the variance of $X_i$ is $(i-1)^2\cdot 1/i + (-1)\cdot(1-1/i) = i-1$ and $s_n=\sum_{i=1}^{n}(i-1)=\frac{n(n-1)}{2}$ Define $S_n = \sum_{i=1}^{n}X_i$.

One can check that the CLT does not apply here and that $\frac{S_n}{s_n}$ does not converge to the standard normal distribution.

Any thoughts on what is the limiting distribution and how to get it?

2

There are 2 best solutions below

7
On

The Lindeberg Central Limit Theorem and Feller-Lévy condition establish that the limiting distribution is not a standard normal distribution. (Many thanks to @Q9y5 for pointing out the correct reading of the Lindeberg Central Limit Theorem vis-a-vis the Feller-Lévy condition).

The following statement of the Lindeberg Central Limit Theorem is based on Wikipedia (which cites Billingsley, 1986, p.369), but see also WolframMathWorld: Lindeberg-Feller Central Limit Theorem and WolframMathWorld: Lindeberg Condition.

In all of the following let $\{ X_{1},\ldots ,X_{n} \}$ be a sequence of independent random variables, each with finite expected value $ \mu _{i}$ and variance $\sigma _{i}^{2}$, and construct $$s_{n}^{2}=\sum _{i=1}^{n}\sigma _{i}^{2}$$ (note: notation for $s_n$ is different to your usage).

Lindeberg Condition. The Lindeberg condition is that for all $\varepsilon >0$ $$\lim _{n\to \infty }{\frac {1}{s_{n}^{2}}}\sum _{i=1}^{n}\mathbb {E} \left[(X_{i}-\mu _{i})^{2}\cdot \mathbf {1} _{\left\{X_{i}:\left|X_{i}-\mu _{i}\right|>\varepsilon s_{n}\right\}}\right]=0 $$ where $\mathbf{{1}_{\{\ldots \}}}$ is the indicator function.

Feller-Lévy condition. The Feller-Lévy condition is $$\max _{k=1,\ldots ,n}{\frac {\sigma _{k}^{2}}{s_{n}^{2}}}\to 0 \quad \text{as $n \to \infty$} $$

Lindeberg Central Limit Theorem. If Lindeberg's condition holds then the distribution of the standardized sums $${\frac {1}{s_{n}}}\sum _{i=1}^{n}\left(X_{i}-\mu _{i}\right) \tag{1}\label{clt:sum}$$ converges towards the standard normal distribution $\mathcal{N}(0,1)$. Moreover if the Feller-Lévy condition holds then Lindeberg's condition is both sufficient and necessary for convergence of (\ref{clt:sum}) to $\mathcal{N}(0,1)$.

Now you calculated $$\begin{align*} \sigma _{k}^{2} &= k - 1 \\ s_n^2 &= \frac{n(n-1)}{2} \end{align*}$$ So $$ \max _{k=1,\ldots ,n}{\frac {\sigma _{k}^{2}}{s_{n}^{2}}} = \frac{2(n-1)}{n(n-1)} = \frac{2}{n} \to 0 \quad \text{as $n \to \infty$} $$ hence the Feller-Lévy condition is satisfied.

But the Lindeberg condition does not hold for $\varepsilon = 1$, as follows: When $\varepsilon = 1$, note that if $n > 2$ then $$ \lvert n - 1 \rvert = \sqrt{(n-1)^2} = \sqrt{(n-1)(n-1)} > \sqrt{\frac{n}{2}(n-1)} = \varepsilon s_n $$ So for $n > 2$ $$\begin{align*} {E}\left[(X_{n}-\mu _{n})^{2}\cdot \mathbf{1}_{\left\{X_{n}:\left|X_{n}-\mu_{n}\right|>\varepsilon s_{n}\right\}}\right] &= {E}\left[(X_{n}-\mu _{n})^{2}\cdot \mathbf{1}_{\left\{X_{n} = n-1 \right\}}\right] \\&= (n-1)^2 \end{align*}$$ Therefore $$\begin{align*} \lim _{n\to \infty }{\frac {1}{s_{n}^{2}}}\sum _{i=1}^{n}\mathbb {E} \left[(X_{i}-\mu _{i})^{2}\cdot \mathbf {1} _{\left\{X_{i}:\left|X_{i}-\mu _{i}\right|>\varepsilon s_{n}\right\}}\right] &\geq \lim _{n\to \infty }{\frac{2(n-1)^2}{n(n-1)}} \\&= \lim _{n\to \infty }{2\frac{n^2 - 2n + 1}{n^2 - n}} > 0 \end{align*}$$

2
On

Here are the results of a simulation (MATLAB 2017a). Note that I calculate $s_n^2 = n/(n-1)/2$ and then normalize by $s_n$.

PDF for n = 1,000

PDF for n = 10,000

PDF for n = 100,000

PDF for n = 1,000,000

n = 1000000;  % Number of random variables.
m = 100000;   % Number of samples of the sum.

% Draw samples of Sn, the sum of random variables X.
Sn = NaN(m,1);
for k = 1 : m
    X = integerweightedreciprocalbernoullirnd(n);
    Sn(k) = sum(X);
end

% Calculate sn.
sn = sqrt(n*(n-1)/2);

% Generate histogram of Sn/sn.
histogram(Sn/sn,'Normalization','pdf');
title(['PDF for $S_n / \sqrt{s_n}$ : n = ', int2str(n), ...
       ' : ' int2str(m) ' samples'], 'Interpreter', 'LaTeX');


% Returns: A sample of X_1, ... X_n as a row vector.
% X_i is generated as follows:
% - Draw U_i as uniform on [0,1].
% - Let p = 1/i and B_i be a Bernoulli random variable with
%   probability p. Then B_i = 1 with probability 1/i, 0 otherwise.
% - Let alpha = i and Y_i = alpha*B_i. Then Y_i = i with 
%   probability 1/i, 0 otherwise.
% - Let X_i = Y_i - 1.
function X = integerweightedreciprocalbernoullirnd(n)

p = 1./(1:n);
alpha = 1:n;

U = rand(1,n);  % Uniform distribution.
B = U <= p;     % Bernoulli random variable, p = 1/i.
Y = alpha.*B;   % i with probability 1/i, 0 otherwise.
X = Y - 1;      % Mean 0.

end