Central Limit Theorem for Non-degenerate U-Statistics

199 Views Asked by At

Let $X_1,X_2,...$ be i.i.d. random variables and $f\colon\mathbb R^{r} \rightarrow \mathbb R$ be a symmetric function of $r$-variables. For each $n \ge r$, the associated U-statistic is defined as,
$$U_n := {n \choose r}^{-1}\sum_{1\le i_1<i_2<...<i_r\le n}f(X_{i_1},X_{i_2},...X_{i_r}),$$ which is clearly a symmetric function of $X_1,X_2,...X_n$. Assume, $$Ef(X_1,X_2,...X_n) = 0,\quad E{f(X_1,X_2,...X_n)}^2 < \infty$$ and for $g(x):= Ef(x,X_2,...X_n)$ we have $$\eta^2 := Eg(X^2_1)>0$$ $$\text{Show that} \quad \frac{\sqrt{n}U_n}{r\eta} \rightarrow N(0,1) \quad \text{in distribution} \quad \text{as} \; n\rightarrow \infty $$

Is this the central limit theorem for non-degenerate U-statistics? Is there any reference for this type of CLT for U-statistics?

$\textbf{My initial idea is that}$
Maybe I can firstly prove $\sqrt{n}\big|\big|U_n -\frac{r}{n}\sum^n_{i}g(X_i)\big|\big|_2 \rightarrow 0$ and use some classical central limit theorem. But I am stuck in proving the first part.

$\textbf{So my question is }$ how to prove $\sqrt{n}\big|\big|U_n -\frac{r}{n}\sum^n_{i}g(X_i)\big|\big|_2 \rightarrow 0$ ?? Could you please give me some details about that? Thank you!

1

There are 1 best solutions below

0
On

I will show the idea when $r=2$. In this case, $$ f\left(X_{i_1},X_{i_2}\right)=g\left(X_{i_1}\right)+g\left(X_{i_2}\right)+h\left(X_{i_1},X_{i_2}\right) $$ where $h$ is such that $\mathbb E\left[h\left(X_{1},x\right)\right]=0$ for all real number $x$. This is the so called Hoeffding decomposition. As you noticed, it suffices to show that $$ \lim_{n\to +\infty}\frac 1{n^{3/2}}\left\lVert \sum_{1\leqslant i_1\lt i_2\leqslant n}h\left(X_{i_1},X_{i_2}\right)\right\rVert_2=0 $$ or equivalently, that $$ \lim_{n\to +\infty}\frac 1{n^3}\mathbb E\left[ \left(\sum_{1\leqslant i_1\lt i_2\leqslant n}h\left(X_{i_1},X_{i_2}\right)\right)^2\right]=0. $$ To this aim, expand the square. We have to treat terms of the form $$ a_{i_1,i_2,j_1,j_2}=\mathbb E\left[h\left(X_{i_1},X_{i_2}\right)h\left(X_{j_1},X_{j_2}\right)\right], 1\leqslant i_1\lt i_2\leqslant n,1\leqslant j_1\lt j_2\leqslant n. $$ If $j_2\gt i_2$, we write (using independence between $\left(X_{i_1},X_{i_2},X_{j_1}\right)$ and $X_{j_2}$) $$ a_{i_1,i_2,j_1,j_2}=\int_{\mathbb R}\mathbb E\left[h\left(X_{i_1},X_{i_2}\right)h\left(X_{j_1},x\right)\right]\mathrm dP_{X_{j_2}}(x)=0. $$ By similar arguments, we can see that $a_{i_1,i_2,j_1,j_2}=0$ if $i_1\neq j_1$ or $i_2\neq j_2$.

The argument for a general $r$ is similar and uses also Hoeffding's decomposition.