Can we derive a formula to calculate pointwise confidence intervals for empirical distribution functions via subsampling?

139 Views Asked by Bumbble Comm At 25 Feb 2026 - 11:24

I am trying to derive a formula to calculate pointwise confidence intervals for empirical distribution functions using Hartigan's subsampling approach. The main idea is similar to bootstrapping, but does not involve resampling with replacement. It simply uses all non-empty subsamples to compute confidence intervals and/or estimate variances.

Assume that we have $N \in \mathbb{N}$ realizations, say $X_1, X_2, \cdots, X_N$, where we assign indices so that $X_i \leq X_j$ whenever $i \leq j$. We want to compute the subsampling confidence interval for $\hat{F}(X_K)$, which is the value the empirical distribution function attains at point $X_K$. To achieve this, we need the subsampling CDF for $\hat{F}(X_K)$. It is not obvious what this CDF looks like, but it is relatively easy to state the subsampling PMF:

$$ p(x)= \frac{1}{2^N - 1} \;\; \cdot \sum_{(k, n) \, \in \, \mathcal{X}(x)} {K \choose k} {N - K \choose n - k} $$ where $$\mathcal{X}(x) = \lbrace (k, n) \in \mathbb{N}^2 : \frac{k}{n} = x \, \text{ and } \, 0 \leq n - k \leq N - K \, \text{ and } \, k \leq K \rbrace, $$ with the implication that $p(x) = 0$ if $\mathcal{X}(x)$ is empty. The next step is to derive a closed-form expression for the subsampling CDF from the PMF, but I failed to do this. Can we derive a closed-form expression for the subsampling CDF?

Note:

The summand is equal to the unnormalized hypergeometric PMF. The key difference is that this PMF aggregates (i.e. sums) configurations with the same success ratio.

Update:

I was able to obtain the following explicit formula for the PMF: $$ p \left( \frac{a}{b} \right) = \frac{1}{2^N - 1} \;\; \cdot \sum^{M(a, \, b)}_{i = 1} {K \choose a \cdot i} {N - K \choose (b - a) \cdot i} $$ where $$M(a, b) = \begin{cases} \left \lfloor \frac{K}{a} \right \rfloor & \frac{a}{b} \geq \frac{K}{N} \, , \\ \\ \left \lfloor \frac{N - K}{b - a} \right \rfloor & \text{otherwise} \, , \end{cases} $$ for any coprime $(a, b) \in \mathbb{N}^2$ with $N \geq b \geq a \leq K$ and $b \neq 0$. Again, the sum is implicitly taken to be zero whenever $M(a, b) < 1$.

Applying several algebraic transformations involving Pochhammer symbols, this expression further simplifies into: $$ p \left( \frac{a}{b} \right) = \begin{cases} \dfrac{2^K - 1}{2^N - 1} & \frac{a}{b} = 1 \, , \\ \\ \dfrac{{}_b F_{b - 1}\left(\alpha \,; \beta \,; (-1)^b\right) - 1}{2^N - 1} & \text{otherwise} \, , \end{cases} $$ where $$ \begin{align} \alpha &= \left \lbrace 1 - \frac{K + i}{a} : i \in \lbrace 1, 2, \cdots, a \rbrace \right \rbrace \cup \left \lbrace \frac{K + i - N}{b - a} : i \in \lbrace 0, 1, \cdots, b - a - 1 \rbrace \right \rbrace \, , \\ \beta &= \left \lbrace \frac{i}{a} : i \in \lbrace 1, 2, \cdots, a \rbrace \right \rbrace \cup \left \lbrace \frac{i}{b - a} : i \in \lbrace 1, 2, \cdots, b - a - 1 \rbrace \right \rbrace \end{align} $$ with ${}_p F_q$ denoting the generalized hypergeometric function. Again, this expression is valid only for coprime $(a, b) \in \mathbb{N}^2$ that satisfy $a \leq K$ and $b - a \leq N - K$.

However, I am currently not able to simplify this formula further or use it to derive a closed-form expression for the subsampling CDF.

Example:

If we choose $N = 7$ and $K = 5$, the feasible success ratios are: $$ \left \lbrace 0, \frac{1}{3}, \frac{1}{2}, \frac{3}{5}, \frac{2}{3}, \frac{5}{7}, \frac{3}{4}, \frac{4}{5}, \frac{5}{6}, 1 \right \rbrace \, . $$ Probabilities of these ratios are: $$ \left \lbrace \frac{3}{127},\frac{5}{127},\frac{20}{127},\frac{10}{127},\frac{25}{127},\frac{1}{127},\frac{20}{127},\frac{10}{127},\frac{2}{127},\frac{31}{127} \right \rbrace \, . $$

I tried consulting a few on-line resources such as OEIS to see whether these numbers belong to a well-known pattern, but was not successful.

Original Q&A

Can we derive a formula to calculate pointwise confidence intervals for empirical distribution functions via subsampling?

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in HYPERGEOMETRIC-FUNCTION

Related Questions in CONFIDENCE-INTERVAL

Trending Questions

Popular # Hahtags

Popular Questions