Confidence interval construction for $f(p)$ of categorical distribution parameter $p$

28 Views Asked by At

Let $X_1, \dots, X_n$ be i.i.d. random variables from a categorical distribution with parameter vector $p = (p_1, \dots, p_k)$.

Suppose we’re interested in the estimating $\vartheta := f(p)$ for some function $f : \mathbb R^k \to \mathbb R$, e.g. the true Gini-Simpson index of this distribution $\vartheta = f(p) = 1- \sum_{j=1}^k p_j^2$ or Shannon entropy $\vartheta = f(p) = -\sum_{j=1}^k p_j \log p_j$.

Is the following procedure for calculating a confidence interval valid? And is it reasonable — at least for some understudied $f$?

Procedure

  1. Use an established method to calculate a simultaneous $(1-\alpha)\cdot 100$% confidence interval $C = C(X_1, \dots, X_n) \subset [0,1]^k$ for the parameters $p$ with a known method. That is,

    $$ \mathbb P (p \in C ) \geq 1-\alpha. $$

  2. Find $L := \inf\{ f(q): q \in C,\,\sum_{j=1}^k q_j=1 \}$ and $U := \sup\{ f(q): q \in C,\,\sum_{j=1}^k q_j=1 \}$. Set $[L, U]$ to be the proposed confidence interval for $\vartheta$.

Reasoning

$$ \begin{aligned} 1 - \alpha &\leq \mathbb P (p \in C) \\ &= \mathbb P \bigg( \underbrace{p \in C,\quad \sum_{j=1}^k p_j = 1}_{\Rightarrow \,\vartheta = f(p) \in \{f(q)\,:\,q \in C,\, \sum_j q_j = 1\}} \bigg) \leq \mathbb P(L \leq \vartheta \leq U). \end{aligned} $$

So the coverage of $[L, U]$ as a confidence interval for $\vartheta$ is at least as good as that of $C$ for $p$, though perhaps overly conservative.

Thoughts (Why bother with this?)

  • Say we’re interested in testing whether $\vartheta = \vartheta_\text{max} := \max\{f(q) : q \in [0,1]^k, \sum_{j=1}^k q_j = 1\}$. One idea is to generate a confidence interval for $\vartheta$ and see if $\vartheta_\text{max}$ lies in the confidence interval. But, for example, (percentile) bootstrapping with the maximum likelihood estimator for $p$ won’t work well, even for large samples, since functions like the Gini-Simpson index or Shannon entropy are maximized exactly when $p_j = 1/k$ for all $j = 1, \dots, k$. In fact, if $n$ is not divisible by $k$, then neither the original sample nor bootstrapped samples of the same size can attain maximum entropy or Gini-Simpson index via plug-in estimation with the ML estimator for $p$. So even if $\vartheta = \vartheta_\text{max}$ and $n$ is large, this naive test will always reject true hypothesis $\vartheta = \vartheta_\text{max}$.

  • The proposed procedure seems like a reasonable go-to method in such situations. Especially if confidence intervals for $f(p)$ are understudied. If $f$ is convex or concave, we can use convex programming for one optimal value and Bauer’s principle to find the other by looking at the extreme points of the polyhedron $\{q \in [0,1]^k : q_i \in [\ell_i, u_i] \, \forall i,\: \sum_j q_i = 1\}$, where we assume that the simultaneous confidence interval $C$ is a $k$-dimensional rectangle.