Showing an equivalence with Kolmogorov-Smirnov statistic

217 Views Asked by At

Background Information:

Starting with the sample $X_1,\ldots, X_{N}$ and sort the sample so that $X_1\leq X_2\leq \cdots \le X_N$. In our case the data set $x_1 = 0.2$, $x_2 = 0.6$, $x_3 = 0.7$. Suppose $X\sim \mathcal{U}(0,1)$ then the cumulative distribution function for $X$ is $$F(x) = x $$ We have $$D_N = \sum_{-\infty < x < \infty}|F_N(x) - F(x)|$$ where $$F_N(x) = \begin{cases} 0 \ &\text{if } x < X_1\\ k/N \ &\text{if } X_k\leq x < X_{k+1}\\ 1 \ &\text{if } x > X_N \end{cases}$$ The first and last terms are $$\sup_{x < X_1}|-F(x)| = F(X_1)$$ $$\sup_{x > X_N} |1 - F(x)| = 1 - F(X_N)$$ For the other terms, observe that $$\sup_{X_k\leq x < X_{k+1}}\left|\frac{k}{N} - F(x)\right| = \max\left(F(X_{k+1}) - \frac{k}{N},\frac{k}{N} - F(X_k)\right); k = 1, \ldots, N - 1$$

$D^{+}$ and $D^{-}$: $$\begin{aligned}D_{N}&=\max\left[F(X_{1}),\max_{k=1,\ldots,N-1}\left(F(X_{k+1})-\tfrac{k}{N},\tfrac{k}{N}-F(X_{k}),1-F(X_{N})\right)\right]\\ &= \max\left[\underbrace{\max_{k=1,\ldots,N}\left(\tfrac{k}{N}-F(X_{k})\right)}_{D^{+}},\underbrace{\max_{k=1,\ldots,N}\left(F(X_{k})-\tfrac{k-1}{N}\right)}_{D^{-}}\right]\\&=\max\{D^{+},D^{-}\}\end{aligned}$$ The formulas simplify if $X\sim U(0,1)$ since then $F(x)=x$.

Question:

Let $x_1,\ldots x_n$ be real numbers in $(0,1)$. Show that the Kolmogorov-Smirnov statistic of $x_1,\ldots x_n$ with the null hypothesis that the numbers follow the $U(0,1)$ distribution is precisely the star-discrepancy of these numbers. Having this equivalence in mind, compare and contrast the two statements "$\{x_n\}$ is a pseudorandom sequence from the uniform distribution $U(0,1)$ that passes the frequency test" with $"\{x_n\}$ is a u.d. mod $1$ sequence".

I am not really sure how to proceed with this, any suggestions are greatly appreciated.

1

There are 1 best solutions below

2
On BEST ANSWER

Is star discrepancy defined as

$ \Delta_N = \sup_{x \in [0, 1]}\left|\frac{1}{N}\sum_{n=1}^{N} \mathbb{1}_{[0, x]}(x_n)-x\right|$ ?

In this case, note that $\frac{1}{N}|\sum_{n=1}^{N}\mathbb{1}_{[0,x]}(x_n) - x| = \frac{1}{N}|k-Nx| = |\frac{k}{N} - x|$, where $k$ is the number of $x_n<x$. This is equal to $|F_N(x)-F(x)|$ in the notation above.