PMF estimation: concentration inequalities for the $l_1$ and $l_\infty$ errors

Question

PMF estimation: concentration inequalities for the $l_1$ and $l_\infty$ errors

609 Views Asked by Bumbble Comm At 27 Mar 2026 - 10:31

Assume that you are given $n$ i.i.d samples $X_1, ..., X_n$ drawn from a discrete distribution $p = (p_1, ..., p_k)$. We would like to estimate $p$ using the empirical estimator \begin{equation} \hat{p}_i = \frac{1}{n} \sum_{j=1}^{n}\mathbb{1}_{\{X_j =i\}}. \end{equation} Using classical Chernoff type concentration inequalities, we can easily derive the following tight bound \begin{equation} \mathbb{P}\left(|\hat{p}_i - p_i| \geq \alpha \right) \leq 2 e^{-n\alpha^2/4}, \end{equation} for any $i \in \{1, ..., k\}$. Can we make use of the above inequality to prove tight upper bounds on the $l_1$ and $l_\infty$ errors? Precisely, we would like to find tight upper bounds on \begin{equation} \mathbb{P}\left(\sum_{i=1}^{k} |\hat{p}_i - p_i| \geq \alpha \right), \end{equation} and \begin{equation} \mathbb{P}\left(\max_{i \in \{1, ..., k\}}|\hat{p}_i - p_i| \geq \alpha \right). \end{equation} Notice that the $|\hat{p}_i - p_i|$'s are correlated. Also, notice that the vector of counts $(Y_1, ..., Y_k)$, where \begin{equation} Y_i = \sum_{j=1}^{n}\mathbb{1}_{\{X_j =i\}}, \end{equation} is distributed according to a Multinomial$(n, p_1, ..., p_k)$ distribution. Therefore, the above problem is equivalent to finding tight upper bounds on \begin{equation} \mathbb{P}\left(\sum_{i =1}^{k} |Y_i - \mathbb{E}[Y_i]| \geq n\alpha \right), \end{equation} and \begin{equation} \mathbb{P}\left( \max_{i \in \{1, ..., k\}}|Y_i - \mathbb{E}[Y_i]| \geq n\alpha \right), \end{equation} where $(Y_1, ..., Y_k) \sim$ Multinomial$(n, p_1, ..., p_k)$. We can possibly use the bound provided in Lemma 3 of this paper, but it only holds for large $n$. Further, it is unclear whether or not that bound is tight.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2015-08-07 17:54:32

Since your alphabet is finite (has size k), you can use method of types; kernel density estimation (i.e. dealing with a continuous alphabet) is much harder.

The method of types (See for example, Dembo and Zeitouni's Large deviations Techniques and Applications 2e, section 2.1 or chapter 11 of Cover and Thomas' Elements of Information theory 2e) tells you that: $P(||\hat{p} - p|| \geq \alpha) \leq \sum_{x : ||x-p|| \geq \alpha} e^{-n D(x,p)} \leq (n+1)^k e^{-n D(p^*,p)}$ where $p^* = \arg \min_{\{x : ||x-p|| \geq \alpha\}} D(x,p)$ and $D(p,q)$ is the Kullback-Leilber divergence. This bound is tight (Sanov's theorem) in the exponent.

Pinsker's inequality says that $D(p,q) \geq \frac{1}{2} ||p-q||_1^2$ and $D(p,q) \geq 2 TV(p,q)^2 \geq 2 ||p-q||_\infty^2$ (by taking singleton sets in the definition of total variation), and these are both tight. Plugging in these bounds into the bound on $P(||\hat{p} - p|| \geq \alpha)$ gives you $\frac{\log P(||\hat{p} - p|| \geq \alpha) }{n}$ behaves like $-\alpha^2/2$ in the $1$-norm case, and $-2 \alpha^2$ in the sup-norm case (and these are tight).

Now, note that $||\hat{p}-p||$ has the bounded differences property: if you change one $X_i$, and it is the 1-norm, it changes by at most $2/n$. If it is the sup-norm, it changes by at most $1/n$. Using the bounded differences inequality will give you a bound then that matches the exponent given by Sanov's theorem + Pinsker ($2 e^{- n \alpha^2/2}$ and $2 e^{- 2 n \alpha^2}$, respectively). Thus, these are tight up to sub-exponential factors.

PMF estimation: concentration inequalities for the $l_1$ and $l_\infty$ errors

There are 1 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Related Questions in PROBABILITY-LIMIT-THEOREMS

Related Questions in CONCENTRATION-OF-MEASURE

Trending Questions

Popular # Hahtags

Popular Questions