Symmetrization and Contraction Principle of Random Variables

1k Views Asked by At

I was reading a paper and came across the terms symmetrization and contraction principle of random variables. I tried to extract the statements as follows:

Symmetrization: Let $X_1,\dots,X_n$ be independent zero-mean random variables and $p\geq 2$, then $$\left\|\sum_{i=1}^n a_i X_i\right\|_p \leq 2 \left\|\sum_{i=1}^n a_i \varepsilon_i X_i\right\|_p$$ where $a_i$ are real numbers and $\varepsilon_i$ denote a sequence of symmetric independent Rademacher random variables (also independent of the $X_i$'s).

Contraction Principle: Let $X_1,\dots,X_n$ be independent non-negative random variables and $p\geq 2$. Further, suppose for each $i$, we have $\mathbb{P}(Y_i\geq t)\geq \mathbb{P}(X_i\geq t)$ for all $t>0$, where $Y_1,\dots, Y_n$ are also non-negative random variables. Then we have $$\left\|\sum_{i=1}^n a_i \varepsilon_i X_i\right\|_p \leq \left\|\sum_{i=1}^n a_i \varepsilon_i Y_i\right\|_p$$ where $a_i$ are real numbers and $\varepsilon_i$ denote a sequence of Rademacher random variables.

There might be more general statements of these results that exist but the paper does not really cite them, and I am having trouble finding references to exact statements and proofs. If anybody can provide a reference (preferably a textbook) or a hint, that would be greatly appreciated!

For reference, the paper and argument cited is here, on page 12.

Edit: I am looking for a reference to read, not a direct solution or anything like that.

4

There are 4 best solutions below

3
On BEST ANSWER

The symmetrization is called Khintchine's inequality, see here: https://en.wikipedia.org/wiki/Khintchine_inequality

For the contraction principle, I don't know any reference, but I can prove it. First note that if $Z\geq 0$ and $p > 0$, then \begin{align} \mathbb{E}[Z^p] = \mathbb{E}\Big[\int_0^{Z} p t^{p-1} dt\Big] = \mathbb{E}\Big[\int_0^{\infty} p t^{p-1} \mathrm{1}_{\{Z\geq t\}} dt\Big] \stackrel{\text{Fubini}}{=} \int_0^{\infty} p t^{p-1} \mathbb{P}(Z\geq t) dt. \end{align} Therefore, if $X,Y\geq 0$ and $\mathbb{P}(X\geq t) \leq \mathbb{P}(Y \geq t)$ for all $t > 0$, then \begin{align*} \mathbb{E}[X^p] = \int_0^{\infty} p t^{p-1} \mathbb{P}(X\geq t) dt \leq \int_0^{\infty} p t^{p-1} \mathbb{P}(Y\geq t) dt = \mathbb{E}[Y^p]. \end{align*} We conclude $(\mathbb{E}[X^p])^{1/p} \leq (\mathbb{E}[Y^p])^{1/p}$ for all $p > 0$.

Thus, with $X = \big|\sum_{i=1}^n a_i \varepsilon_i X_i\big|$ and $Y = \big|\sum_{i=1}^n a_i \varepsilon_i Y_i\big|$, we get the claim.


EDIT: We get the claim assuming $\mathbb{P}(\big|\sum_{i=1}^n a_i \varepsilon_i X_i\big| > t) \leq \mathbb{P}(\big|\sum_{i=1}^n a_i \varepsilon_i Y_i\big| > t)$, which doesn't follow from $\mathbb{P}(X_i > t) \leq \mathbb{P}(Y_i > t), ~1 \leq i \leq n$. Humm, so it's not clear that the third line of the middle equation on page 12 of your reference is actually correct, maybe it was forgotten that the $\varepsilon_i$'s can be negative ? Maybe there is another argument at play, I'm not sure.

0
On

Extended comment.

I don't have the kind of concrete knowledge to assess the arguments being made in the paper. However, I have heard the semantic associations of 'symmetrization' and 'contraction' in context of 'Rademacher random variables' before. I am aware that both principles form the basis of arguments made in statistical learning theory in the original papers of Vapnik and Chervonenkis, and also in the study of empirical processes.

Having skimmed the references in your paper, the books that I recognise from there and which mention 'symmetrization' in the context of above is:

  • A. W. van der Vaart, J.A. Wellner, Weak convergence and empirical processes. With applications to statistics. Springer Series in Statistics. Springer-Verlag, New York, 1996. https://doi.org/10.1007/978-1-4757-2545-2 - see section 2.3.

On symmetrization and contraction principles discussed in proximity of each other, there is a small section in the following book on concentration inequalities:

  • Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. Oxford: Oxford University Press. DOI:10.1093/acprof:oso/9780199535255.001.0001 - see section 11.3 and also bibliographical remarks in that section for further reading references.

And there is a section on symmetrization in these introductory notes for statisticians on empirical process theory here by B. Sen.

And also the following by a researcher who does research in empirical process theory:

3
On

I may as well write the proof I alluded to last night in the above comments.

Let $X_1, …, X_n, \epsilon_1, …, \epsilon_n$ be mutually independent random variables. Suppose that $E[X_i]=0$ and $P[\epsilon_i=-1]=P[\epsilon_i=1] = 1/2$ for all $i \in \{1, …, n\}$. Fix $p\geq 1$. For a random variable $Y$ define $$ \|Y\|_p = E[|Y|^p]^{1/p}$$

Claim: For any real numbers $a_1, ..., a_n$ we have $$ \left\|\sum_{i=1}^n a_iX_i \right\|_p \leq 2 \left\|\sum_{i=1}^n a_i\epsilon_i X_i\right\|_p$$

Proof: Without loss of generality we can assume $a_i=1$ for all $i$ (else, just define $Y_i=a_iX_i$).

Let $Z=(Z_1, ..., Z_n)$ be a vector that is independent of $(X_1, ..., X_n, \epsilon_1, ..., \epsilon_n)$ but has the same distribution as $X=(X_1, ..., X_n)$. Then $X_i-Z_i$ has the same distribution as $-(X_i-Z_i)$ and so it also has the same distribution as $\epsilon_i(X_i-Z_i)$. Then \begin{align} \left\|\sum_{i=1}^n(X_i-Z_i)\right\|_p &= \left\|\sum_{i=1}^n\epsilon_i(X_i-Z_i)\right\|_p\\ &\leq \left\|\sum_{i=1}^n\epsilon_iX_i\right\|_p + \left\|\sum_{i=1}^n\epsilon_i Z_i\right\|_p\\ &=2\left\|\sum_{i=1}^n\epsilon_iX_i\right\|_p \quad (*) \end{align}

On the other hand, by convexity of $|x|^p$ and Jensen's inequality we get: \begin{align} E\left[\left|\sum_{i=1}^n (X_i-Z_i)\right|^p | X\right] &\geq \left| E\left[\sum_{i=1}^n (X_i-Z_i)|X\right]\right|^p\\ &=\left|\sum_{i=1}^n X_i \right|^p \end{align} where we have use the fact that $E[Z_i|X]=0$ for all $i \in \{1, ..., n\}$. Taking expectations of both sides gives $$ E\left[\left|\sum_{i=1}^n (X_i-Z_i)\right|^p \right] \geq E\left[\left|\sum_{i=1}^n X_i \right|^p\right]$$ Raising both sides to the $1/p$ power gives $$ \left\|\sum_{i=1}^n (X_i-Z_i)\right\|_p \geq \left\|\sum_{i=1}^n X_i \right\|_p \quad (**)$$ Combining (*) and (**) proves the result.

0
On

Since Michael has answered the first one, I will write essentially the proof of the second, taken from "Probability in Banach spaces" by Ledoux (as mentioned by Teresa Lisbon in a comment).

We have that

$$E_{\epsilon}[\left\|\sum_{i=1}^n a_i \varepsilon_i \beta_i y_i \right\|_p^p] \leq E_{\epsilon}[\left\|\sum_{i=1}^n a_i \varepsilon_i y_i\right\|_p^p]$$ whenever $|\beta_i| \leq 1$. This follows from convexity of $(\beta_1, ..., \beta_n) \rightarrow \left\|\sum_{i=1}^n a_i \varepsilon_i \beta_i y_i \right\|_p$, and convex functions achieve their extrema on a boundary point (and this function takes the same value on each of those points). Now, note that since $Y_i$ dominates $X_i$ stochastically, there exists a coupling $\pi$ of all of them, independent of $\epsilon$s such that $\beta_i := \frac{X_i}{Y_i} < 1 $ almost surely (by abuse of notation, consider $0/0 = 0$; note that we can construct the coupling be such that $Y_i = 0$ implies $X_i = 0$). So, in particular, for this set of $X_i$, and $Y_i$s we have,

$$E_{\epsilon}[\left\|\sum_{i=1}^n a_i \varepsilon_i X_i \right\|_p^p] \leq E_{\epsilon}[\left\|\sum_{i=1}^n a_i \varepsilon_i Y_i\right\|_p^p]$$ $\pi$-almost surely. Taking expectation w.r.t. $\pi$, and taking the $p-th$ root should finish the proof.