Proof of a limiting distribution

89 Views Asked by At

I have read in a book that the follow should hold, but can't seem to prove it.

Let $ X_i $ be iid random variable where $ P(X_i > t) = e^{-t}$. Define $$ Y_i = 1 \text{ if }|X_{i+1} - X_i| > \log(n) \text{ for } i\leq n$$ and $ Y_i = 0 $ otherwise. Let $ W_n = \sum_{i}^{n}Y_i$. Then $ W $ is distributed as a compound poisson variable.

I really can't figure out how to get this result, I assume the underlying proof should resort to some Stein method or something, but I can't see it myself.

Any help appreciated! Thanks

1

There are 1 best solutions below

1
On BEST ANSWER

We claim that

Proposition. Let $W_n$ be as in OP. Then as $n\to\infty$, $$ W_n \quad \Rightarrow \quad \sum_{i=1}^{N} \xi_i , \tag{*}$$ where

  • $N \sim \operatorname{Poisson}(2/3)$,
  • $\xi_i \sim \nu$, where $\nu = \frac{1}{2}(\delta_1 + \delta_2)$ is a shifted Bernoulli distribution, and
  • $N, \xi_1, \xi_2, \ldots$ are independent.

Let $(X_i)_{i\geq 1}$ be a sequence of i.i.d. $\operatorname{Exp}(1)$ variables, and define the events $A_{n,i}$ and random variables $S_{n,k}$ by

$$ A_{n,i} = \{ \left| X_{i+1} - X_i \right| > \log n\} \qquad\text{and}\qquad S_{n,k} = \sum_{i=1}^{k} \mathbb{I}[A_{n,i}]. $$

Lemma. As $k, n \to \infty$ in such a way that $1 \ll k \ll n$, we have $$ \mathbf{P}( S_{n,k} \geq 1) \sim \frac{2k}{3n} $$ as well as $$ \mathbf{P}( S_{n,k} \in \cdot \mid S_{n,k} \geq 1) \quad \Rightarrow \quad \nu(\cdot) = \frac{1}{2}(\delta_1(\cdot) + \delta_2(\cdot)). $$

Here is an intuition: Conditioned on $S_{n,k} \geq 1$, the value of $S_{n,k}$ is realized by a single 'outlier' $X_i$ satisfying $X_i > \log n$. Closely inspecting the contribution from that single outlier, we obtain the desired assertion of the lemma. The actual proof of the lemma is postponed to the end.

Now fix a sequence $(k_n)$ in $\mathbb{Z}_{>0}$ so that $2 \leq k_n \leq n$ and $1 \ll k_n \ll n$ as $n\to\infty$. We introduce

$$ \tilde{W}_n = \sum_{q=0}^{\lfloor n/k_n\rfloor-1} \tilde{S}_{n,q}, \qquad \text{where} \qquad \tilde{S}_{n,q} = \sum_{r=1}^{k_n-1} \mathbb{I}[A_{n,k_nq + r}]. $$

It is clear that $\tilde{S}_{n,0}, \tilde{S}_{n,1}, \tilde{S}_{n,2}, \ldots$ are identically distributed. Moreover, since each $\tilde{S}_{n,q}$ depends only on $X_{k_n q + 1}, \ldots, X_{k_n q+ k_n}$, they are independent. From this, the m.g.f. of $\tilde{W}_n$ is computed by

\begin{align*} \mathbf{E}[e^{s \tilde{W}_n}] &= \prod_{q=0}^{\lfloor n/k_n\rfloor-1} \mathbf{E}\left[ e^{s\tilde{S}_{n,q}} \right] = \mathbf{E}\left[ e^{s\tilde{S}_{n,0}} \right]^{\lfloor n/k_n\rfloor} \\ &= \left( 1 + \mathbf{E}\left[ e^{s\tilde{S}_{n,0}} - 1 \, \middle|\, \tilde{S}_{n,0} \geq 1 \right]\mathbf{P}(\tilde{S}_{n,0} \geq 1) \right)^{\lfloor n/k_n\rfloor}. \end{align*}

So, letting $n \to \infty$ and applying the lemma,

$$ \mathbf{E}\left[ e^{s\tilde{S}_{n,0}} - 1 \, \middle|\, \tilde{S}_{n,0} \geq 1 \right] \xrightarrow{n\to\infty} \int_{\mathbb{R}} (e^{sx} - 1) \, \nu(\mathrm{d}x). $$

and hence

$$ \lim_{n\to\infty} \mathbf{E}[e^{s \tilde{W}_n}] = \exp\left( \frac{2}{3} \int_{\mathbb{R}} (e^{sx} - 1) \, \nu(\mathrm{d}x) \right). $$

This is precisely the m.g.f. of the compound Poisson distribution in $\text{(*)}$, hence we know that the claim holds with $\tilde{W}_n$ in place of $W_n$.

Finally, noting that $\mathbf{E}[\mathbb{I}[A_{n,1}]] = \frac{1}{n}$ and

$$ W_n - \tilde{W}_n = \sum_{q=0}^{\lfloor n/k_n\rfloor -1} \mathbf{1}[A_{n,k_n q}] + \sum_{i = \lfloor n/k_n\rfloor k_n}^{n} \mathbf{1}[A_{n,i}], $$

we have

$$ \mathbf{E}[|W_n - \tilde{W}_n|] \leq \frac{\lfloor n/k_n\rfloor}{n} + \frac{k_n}{n} \xrightarrow{n\to\infty} 0. $$

This implies that $W_n$ and $\tilde{W}_n$ converges to the same limit in distribution, and therefore the desired claim follows. $\square$


Proof of Lemma. In order to have $S_{n,k} \geq 1$, at least one of $X_1, \ldots, X_{k+1}$ must be at least $\log n$. Let $\mathcal{I}_{n,k}$ be a random index set defined by

$$ \mathcal{I}_{n,k} = \{ i \in [k+1] : X_i > \log n\}, $$

where $[k+1] = \{1,2,\ldots,k+1\}$. Then for each $E \subseteq \mathbb{Z}_{>0}$,

\begin{align*} \mathbf{P}( S_{n,k} \in E ) = \mathbf{P}( S_{n,k} \in E , \ \mathcal{I}_{n,k} \neq \varnothing ) = \sum_{\substack{I \subseteq [k+1] \\ I \neq \varnothing}} \mathbf{P}( S_{n,k} \in E , \ \mathcal{I}_{n,k} = I ). \end{align*}

However, for each non-empty $I \subseteq [k+1]$, we have

\begin{align*} \mathbf{P}( S_{n,k} \in E , \ \mathcal{I}_{n,k} = I ) \leq \mathbf{P}( I \subseteq \mathcal{I}_{n,k} ) = \mathbf{P}(\cap_{i \in I} \{ X_i > \log n \}) = \frac{1}{n^{|I|}}. \end{align*}

Plugging this bound into the sum above, for $1 \leq k \leq n$, we obtain

\begin{align*} \left| \mathbf{P}( S_{n,k} \in E ) - \sum_{i=2}^{k} \mathbf{P}( S_{n,k} \in E , \ \mathcal{I}_{n,k} = \{i\} ) \right| &\leq \frac{2}{n} + \sum_{j=2}^{k+1} \binom{k+1}{j}\frac{1}{n^j} \\ &\leq C\left(\frac{1}{n}+\frac{k^2}{n^2}\right) \tag{L.1} \end{align*}

for some absolute constant $C > 0$. On the other hand, when $\mathcal{I}_{n,k} = \{i\}$ is a singleton, then only $\mathbb{I}[A_{n,i-1}]$ and $\mathbb{I}[A_{n,i}]$ can contribute to the value of the sum $S_{n,k}$. So, if $2 \leq i \leq k$, then by the symmetry,

\begin{align*} &\mathbf{P}( S_{n,k} \in E , \ \mathcal{I}_{n,k} = \{i\} ) \\ &= \mathbf{P}( S_{n,k} \in E , \ \mathcal{I}_{n,k} = \{2\} )\\ &= \mathbf{P}( \mathbb{I}[A_{n,1}] + \mathbb{I}[A_{n,2}] \in E, \ \mathcal{I}_{n,k} = \{2\} ) \\ &= \mathbf{P}( \mathbb{I}[A_{n,1}] + \mathbb{I}[A_{n,2}] \in E , \ X_1, X_3 \leq \log n \mid X_2 > \log n ) \\ &\quad \times \mathbf{P}(X_2 > \log n) \mathbf{P}(\cap_{j=4}^{k+1} \{ X_j \leq \log n\}). \end{align*}

So by writing $\tilde{X}_2 = X_2 - \log n$, then the law of $\tilde{X}_2$ conditioned on $\{X_2 > \log n\}$ is again $\operatorname{Exp}(1)$ by the memorylessness property, and so,

\begin{align*} &\mathbf{P}( S_{n,k} \in E , \ \mathcal{I}_{n,k} = \{i\} ) \\ &= \mathbf{P}( \mathbb{I}[X_2 > X_1] + \mathbb{I}[X_2 > X_3] \in E , \ X_1, X_3 \leq \log n ) \frac{(1 - \frac{1}{n})^{k-2}}{n}. \tag{L.2} \end{align*}

So, using $\text{(L.1)}$ and $\text{(L.2)}$ and noting that

\begin{align*} \mathbf{P}( \mathbb{I}[X_2 > X_1] + \mathbb{I}[X_2 > X_3] = s , \ X_1, X_3 \leq \log n ) = \begin{cases} \frac{1}{3} + \mathcal{O}\left(\frac{1}{n}\right), & \text{if } s = 1, 2, \\ 0, & \text{if } s \geq 3. \end{cases} \end{align*}

we conclude

\begin{align*} \mathbf{P}( S_{n,k} = s ) = \begin{cases} \frac{k-1}{n}\left(1 - \frac{1}{n}\right)^{k-2} \left( \frac{1}{3} + \mathcal{O}\left(\frac{1}{n}\right) \right) + \mathcal{O}\left(\frac{1}{n}+\frac{k^2}{n^2}\right), & \text{if } s = 1, 2, \\ \mathcal{O}\left(\frac{1}{n}+\frac{k^2}{n^2}\right), & \text{if } s \geq 3. \end{cases} \end{align*}

Now combining altogether, we find that

$$ \mathbf{P}( S_{n,k} \in E \mid S_{n,k} \geq 1) = \frac{ \delta_{1}(E) + \delta_2(E) + \mathcal{O}(\varepsilon_{n,k})}{2 + \mathcal{O}(\varepsilon_{n,k})}, $$

where $\varepsilon_{n,k}$ takes the form

$$ \varepsilon_{n,k} = \frac{1}{n} + \frac{\frac{1}{k}+\frac{k}{n}}{\left(1 - \frac{1}{n}\right)^{k-2}}. $$

Since $\varepsilon_{n,k} \to 0$ as $k, n \to \infty$ in such a way that $1 \ll k \ll n$, the desired claim follows. $\square$