I was wondering about the following situation: Suppose we have a random variable $X$ taking values in the finite set $\{1,2,\ldots,k\}$. Let the probability mass function be denoted by $f$. Suppose we draw $n$ iid samples $X_1,\ldots,X_n$ from the distribution of $X$. This gives us an "empirical probability mass function" $f_n(x):=\frac 1 n \sum_{i=1}^n\mathbb 1_{\{x\}}(X_i)$, $x\in \mathbb R$. (For $x\not\in \{1,2,\ldots,k\}$, of course $f_n(x) =f(x)=0$).
We know from the (weak) law of large numbers that, for any $x\in\{1,2,\ldots,k\}$, $ f_n(x)\to f(x)=P(X=x)$ converges in probability. From this, we can deduce that $f_n$ converges uniformly in probability to $f$ (here we use that $X$ only takes finitely many values) i.e., for any $\varepsilon>0$ we have that $P(\max_{x=1,\ldots,k}|f(x)-f_n(x)|> \varepsilon)\to 0$ as $n\to\infty$.
Are there any concentration inequalities that give me an upper bound depending on $n$, $k$, and $\varepsilon>0$ for the probability $$P\Bigl(\max_{x=1,\ldots,k}|f(x)-f_n(x)|> \varepsilon\Bigr).$$ So, something similiar to the Dvoretzky–Kiefer–Wolfowitz inequality: $$P\Bigl(\sup_{x\in\mathbb R} |F_n(x) - F(x)| > \varepsilon \Bigr) \le \text{constant}\cdot e^{-2n\varepsilon^2}\qquad $$ for the empirical distribution function $F_n$ and distribution function $F$, but instead for the probability mass function?
One approach you can use is the following. Starting by using the union bound $$ P\left (\max_{x=1,...,k} | f(x) - f_n(x)| > \epsilon\right ) \leq \sum_{i=1}^k P(|f(i) - f_n(i)| > \epsilon) $$ Next you can control the terms in the sum. We can write $f_n(i) = \frac{1}{n}\sum_{j=1}^n B_j$ where $B_j$ is the Bernoulli random variable $1[X_j = i]$. Note that $B_j = 1[X_j = i]$ are i.i.d. with distribution $\text{Bern}(f(i))$ since $P(X_j = i) = f(i)$ by construction.
Importantly we have $$\mathbb{E} \frac{1}{n}\sum_{j=1}^n B_j = \frac{1}{n}\sum_{j=1}^n \mathbb{E}B_j = \frac{1}{n}\sum_{j=1}^n f(i) = f(i).$$
From here one can use a few different inequalities to control $P(|f(i) - f_n(i)| > \epsilon)$, ie Bernstein's Inequality, Hoeffding's Inequality or Bennett's Inequality. Using Hoeffding, one obtains the bound \begin{align} P(|f(i) - f_n(i)| > \epsilon) &= P \left ( \left |\frac{1}{n}\sum_{j=1}^n B_j - \mathbb{E}\frac{1}{n}\sum_{j=1}^nB_j \right | > \epsilon \right ) & \\ &= P \left ( \left |\sum_{j=1}^n B_j - \mathbb{E}\frac{1}{n}\sum_{j=1}^nB_j \right | > n\epsilon \right ) \\ &\leq 2\exp\left (-\frac{2n^2\epsilon^2}{\sum_{j=1}^n 1} \right ) = 2\exp(-2n\epsilon^2) \end{align} and summing over these one can obtain $$ P\left (\max_{x=1,...,k} | f(x) - f_n(x)| > \epsilon\right ) \leq 2k\exp(-2n\epsilon^2) $$ Using the other bounds instead of Hoeffding will give you slightly different results. Getting a bound which doesn't have $k$ showing up is probably quite a bit more challenging.