Concentration of the $\ell_2$ error of the empirical distribution

Question

Concentration of the $\ell_2$ error of the empirical distribution

446 Views Asked by Bumbble Comm At 28 Mar 2026 - 8:51

Let $X$ be a random variable that takes values only in the set $\{1,2, \dots, m\}$ such that $\Pr(X = i) = p_i$ for all $i = 1,2, \dots, m$. Let $S = \{X_1, X_2, \dots, X_n\}$ be a set of $n$ i.i.d. realisations of $X$. We can then construct the empirical distribution $\{\hat{p}_i\}_{i = 1}^m$ using the samples from $S$, where $\hat{p}_i = \frac{1}{n} |\{ j : X_j = i\}|$.

I am interested in the concentration inequalities for the random variable $Z := Y - \mathbb{E}[Y]$ where $Y = \| \hat{p} - p \|^2_2$, the square of the $\ell_2$ norm of the difference between the true and the empirical distribution. I have found results in the literature on the concentration of the $\| \hat{p} - p\|_1$, $D(\hat{p} \| p)$ and $\sup_{i} |\hat{p}_i - p_i |$, where $D(p\|q)$ is the KL divergence between $p$ and $q$. However, quite surprisingly, I haven't come across any result on the concentration of the $\ell_2$ norm. While it is possible to directly use the results for other norm for my case, it turns out that the resulting bounds are not tight enough for my case.

Any leads or references will be appreciated. Thanks!

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2021-04-04 16:58:04

Note that if $F_n$ denotes the empirical c.d.f. and $F$ the c.d.f., and $i^{\star} = \text{argmax}_{1 \leq i \leq m} |\hat{p}_i - p_i|$, then

$\|v\|_2 \leq \sqrt{m} \|v\|_{\infty}$ for any vector $v\in \mathbb{R}^m$, see for example here: Equivalence of Two Norm and Infinity Norm.
$\|\hat{p}_n - p\|_{\infty} = |\hat{p}_{i^{\star}} - p_{i^{\star}}| \leq 2 \|F_n - F\|_{\infty}$, since the worst case scenario is that $F_n(x_{i^{\star}}^-)$ was at a distance $\|\hat{p}_n - p\|_{\infty}$ above or below $F(x_{i^{\star}}^-)$ and then after taking the mass at $x_{i^{\star}}$ into account, the roles flipped.
$\mathbb{P}(\|F_n - F\|_{\infty} > t) \leq 2 e^{-2 n t^2}$ for all $t > 0$, by the Dvoretsky, Kiefer and Wolfowitz theorem (see e.g. here: https://en.wikipedia.org/wiki/Dvoretzky%E2%80%93Kiefer%E2%80%93Wolfowitz_inequality#:~:text=In%20the%20theory%20of%20probability,the%20empirical%20samples%20are%20drawn.)

By these three points (in order), we have, for all $t > 0$, \begin{align*} \mathbb{P}\Big(\|\hat{p}_n - p\|_2 > t\Big) \leq \mathbb{P}\Big(\|\hat{p}_n - p\|_{\infty} > \frac{t}{\sqrt{m}}\Big) \leq \mathbb{P}\Big(\|F_n - F\|_{\infty} > \frac{t}{2 \sqrt{m}}\Big) \leq 2 e^{- \frac{n t^2}{2 m}}. \end{align*}

Furthermore, if you want the concentration about the mean, note that $x\mapsto \sqrt{x}$ is concave, so Jensen's inequality yields \begin{align*} \mathbb{E}\Big[\|\hat{p}_n - p\|_2\Big] &\leq \sqrt{\mathbb{E}\Big[\sum_{i=1}^m |\hat{p}_i - p|^2\Big]} \\ &= \sqrt{\sum_{i=1}^m \frac{1}{n^2} \mathbb{E}\Big[|\sum_{j=1}^n \mathrm{1}_{\{X_j = i\}} - n p_i|^2\Big]} \\ &\leq \sqrt{m \cdot \frac{1}{n^2} \cdot n \max_{1 \leq i \leq m} p_i (1 - p_i)} \quad \text{since } \sum_{j=1}^n \mathrm{1}_{\{X_j = i\}}\sim \text{Binomial}(n, p_i) \\ &\leq \sqrt{\frac{m}{4 n}} \quad \text{since } \max_{x\in [0,1]} x(1-x) = 1/4 \end{align*} and thus, by the previous concentration bound, we have \begin{align} \mathbb{P}\bigg(\Big|\|\hat{p}_n - p\|_2 - \mathbb{E}\Big[\|\hat{p}_n - p\|_2\Big]\Big| > t\bigg) &\leq \mathbb{P}\Big(\|\hat{p}_n - p\|_2 + \mathbb{E}\Big[\|\hat{p}_n - p\|_2\Big] > t\Big) \\ &\leq \mathbb{P}\Big(\|\hat{p}_n - p\|_2 > t - \sqrt{\frac{m}{4 n}}\Big) \\ &\leq 2 \exp\Bigg(- \frac{n \big(t - \sqrt{\frac{m}{4 n}}\big)^2}{2 m}\Bigg) \\ &\leq 2 \exp\Big(-\frac{n t^2}{2m}\Big) \exp\Big(+ \frac{t}{4} \sqrt{\frac{n}{m}}\Big) \end{align} In particular, if $\lim_{n\to \infty} \frac{t}{\sqrt{n/m}} = 0$, then for any given $\epsilon > 0$, we have, for $n$ large enough, \begin{align} \mathbb{P}\bigg(\Big|\|\hat{p}_n - p\|_2 - \mathbb{E}\Big[\|\hat{p}_n - p\|_2\Big]\Big| > t\bigg) &\leq 2 \exp\Big(-\frac{n t^2}{(2 + \epsilon) m}\Big). \end{align} This whole reasoning is valid even if $m$ is a function of $n$.

Concentration of the $\ell_2$ error of the empirical distribution

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in RANDOM-VARIABLES

Related Questions in CONCENTRATION-OF-MEASURE

Trending Questions

Popular # Hahtags

Popular Questions