Assume that you are given $n$ i.i.d samples $X_1, ..., X_n$ drawn from a discrete distribution $p = (p_1, ..., p_k)$. We would like to estimate $p$ using the empirical estimator \begin{equation} \hat{p}_i = \frac{1}{n} \sum_{j=1}^{n}\mathbb{1}_{\{X_j =i\}}. \end{equation} Using classical Chernoff type concentration inequalities, we can easily derive the following tight bound \begin{equation} \mathbb{P}\left(|\hat{p}_i - p_i| \geq \alpha \right) \leq 2 e^{-n\alpha^2/4}, \end{equation} for any $i \in \{1, ..., k\}$. Can we make use of the above inequality to prove tight upper bounds on the $l_1$ and $l_\infty$ errors? Precisely, we would like to find tight upper bounds on \begin{equation} \mathbb{P}\left(\sum_{i=1}^{k} |\hat{p}_i - p_i| \geq \alpha \right), \end{equation} and \begin{equation} \mathbb{P}\left(\max_{i \in \{1, ..., k\}}|\hat{p}_i - p_i| \geq \alpha \right). \end{equation} Notice that the $|\hat{p}_i - p_i|$'s are correlated. Also, notice that the vector of counts $(Y_1, ..., Y_k)$, where \begin{equation} Y_i = \sum_{j=1}^{n}\mathbb{1}_{\{X_j =i\}}, \end{equation} is distributed according to a Multinomial$(n, p_1, ..., p_k)$ distribution. Therefore, the above problem is equivalent to finding tight upper bounds on \begin{equation} \mathbb{P}\left(\sum_{i =1}^{k} |Y_i - \mathbb{E}[Y_i]| \geq n\alpha \right), \end{equation} and \begin{equation} \mathbb{P}\left( \max_{i \in \{1, ..., k\}}|Y_i - \mathbb{E}[Y_i]| \geq n\alpha \right), \end{equation} where $(Y_1, ..., Y_k) \sim$ Multinomial$(n, p_1, ..., p_k)$. We can possibly use the bound provided in Lemma 3 of this paper, but it only holds for large $n$. Further, it is unclear whether or not that bound is tight.
2026-03-27 22:31:04.1774650664
PMF estimation: concentration inequalities for the $l_1$ and $l_\infty$ errors
609 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in PROBABILITY-THEORY
- Is this a commonly known paradox?
- What's $P(A_1\cap A_2\cap A_3\cap A_4) $?
- Another application of the Central Limit Theorem
- proving Kochen-Stone lemma...
- Is there a contradiction in coin toss of expected / actual results?
- Sample each point with flipping coin, what is the average?
- Random variables coincide
- Reference request for a lemma on the expected value of Hermitian polynomials of Gaussian random variables.
- Determine the marginal distributions of $(T_1, T_2)$
- Convergence in distribution of a discretized random variable and generated sigma-algebras
Related Questions in STATISTICS
- Given is $2$ dimensional random variable $(X,Y)$ with table. Determine the correlation between $X$ and $Y$
- Statistics based on empirical distribution
- Given $U,V \sim R(0,1)$. Determine covariance between $X = UV$ and $V$
- Fisher information of sufficient statistic
- Solving Equation with Euler's Number
- derive the expectation of exponential function $e^{-\left\Vert \mathbf{x} - V\mathbf{x}+\mathbf{a}\right\Vert^2}$ or its upper bound
- Determine the marginal distributions of $(T_1, T_2)$
- KL divergence between two multivariate Bernoulli distribution
- Given random variables $(T_1,T_2)$. Show that $T_1$ and $T_2$ are independent and exponentially distributed if..
- Probability of tossing marbles,covariance
Related Questions in MACHINE-LEARNING
- KL divergence between two multivariate Bernoulli distribution
- Can someone explain the calculus within this gradient descent function?
- Gaussian Processes Regression with multiple input frequencies
- Kernel functions for vectors in discrete spaces
- Estimate $P(A_1|A_2 \cup A_3 \cup A_4...)$, given $P(A_i|A_j)$
- Relationship between Training Neural Networks and Calculus of Variations
- How does maximum a posteriori estimation (MAP) differs from maximum likelihood estimation (MLE)
- To find the new weights of an error function by minimizing it
- How to calculate Vapnik-Chervonenkis dimension?
- maximize a posteriori
Related Questions in PROBABILITY-LIMIT-THEOREMS
- weak limit similiar to central limit theorem
- What is the name of the method or process when a system is evaluated against the highest degree terms?
- Law of large numbers and a different model for the average of IID trials
- Prove that regression beta of order statistics converges to 1?
- Random variable convergence question
- How does this sequence of distributions converge?
- Determine limit distribution
- Relation between (non-random) Big O and probability little o
- How to derive approximation result from Levy 0-1 law?
- binomial normal with dependent success probability
Related Questions in CONCENTRATION-OF-MEASURE
- Improved Bennet Inequality for Vector-Valued RV
- Concentration of the norm (sub-gaussianity)
- A simple proof of McDiarmid's inequality?
- On the 1/2 assumption on concentration of measure on continuous cube
- Concentration inequalities for supremum of moving average
- A problem of proving that a certain concentration inequality cannot be improved
- To establish an inequality using Chebyshev's probability bound
- Concentration inequalities on the supremum of average
- Books about exponential tilting
- Hoeffding's inequality for dependent random variable
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
Since your alphabet is finite (has size k), you can use method of types; kernel density estimation (i.e. dealing with a continuous alphabet) is much harder.
The method of types (See for example, Dembo and Zeitouni's Large deviations Techniques and Applications 2e, section 2.1 or chapter 11 of Cover and Thomas' Elements of Information theory 2e) tells you that: $P(||\hat{p} - p|| \geq \alpha) \leq \sum_{x : ||x-p|| \geq \alpha} e^{-n D(x,p)} \leq (n+1)^k e^{-n D(p^*,p)}$ where $p^* = \arg \min_{\{x : ||x-p|| \geq \alpha\}} D(x,p)$ and $D(p,q)$ is the Kullback-Leilber divergence. This bound is tight (Sanov's theorem) in the exponent.
Pinsker's inequality says that $D(p,q) \geq \frac{1}{2} ||p-q||_1^2$ and $D(p,q) \geq 2 TV(p,q)^2 \geq 2 ||p-q||_\infty^2$ (by taking singleton sets in the definition of total variation), and these are both tight. Plugging in these bounds into the bound on $P(||\hat{p} - p|| \geq \alpha)$ gives you $\frac{\log P(||\hat{p} - p|| \geq \alpha) }{n}$ behaves like $-\alpha^2/2$ in the $1$-norm case, and $-2 \alpha^2$ in the sup-norm case (and these are tight).
Now, note that $||\hat{p}-p||$ has the bounded differences property: if you change one $X_i$, and it is the 1-norm, it changes by at most $2/n$. If it is the sup-norm, it changes by at most $1/n$. Using the bounded differences inequality will give you a bound then that matches the exponent given by Sanov's theorem + Pinsker ($2 e^{- n \alpha^2/2}$ and $2 e^{- 2 n \alpha^2}$, respectively). Thus, these are tight up to sub-exponential factors.