Two-Sample Kolmogorov-Smirnov Test

Question

Two-Sample Kolmogorov-Smirnov Test

2.3k Views Asked by Bumbble Comm At 27 Mar 2026 - 11:13

I am trying to understand the Two-Sample Kolmogorov-Smirnov Test. Somehow no where are good examples connecting math and real example especially having to different distributions. Does someone knows a example to find or can give me one?

I added an example into my question and would like to check whether I have the same understanding and how do I calculate the p-value now?:

ID  Sample X    Sample Y    Cum F(X)    Cum F(Y)    Diff    
1   4           1           0.026490066 0.008196721 0.018293345 
2   28          18          0.21192053  0.155737705 0.056182825 
3   24          25          0.370860927 0.360655738 0.010205189 
4   21          5           0.509933775 0.401639344 0.108294431 
5   23          13          0.662251656 0.508196721 0.154054934 
6   12          7           0.741721854 0.56557377  0.176148084 
7   7           20          0.78807947  0.729508197 0.058571273 
8   23          13          0.940397351 0.836065574 0.104331777 
9   9           20          1           1           0   
Sum 151 122     D-stat  0.176148084 
Count   9   9   D-crit  0.64021448  
                Significance    No  

                No  H_0 the samples come from P,
                Yes H_1 the samples do not come from P

To explain in math I did the following:

I have two samples (X and Y) and I would like to test if their distributions are the same.

$X = Sample$ X
$Y = Sample$ Y
$F(X_i) = \frac{X_i}{N};$ Observed cumulative frequency distribution of a random sample of n observations; (No.of observations ≤ X)/(sum observations)
$F(Y_i) = \frac{X_i}{N};$ Observed cumulative frequency distribution of a random sample of n observations; No.of observations ≤ Y)/(sum observations)
$F(Y_i) = \frac{Y_i}{N};$ Observed
$n_X = \sum_{i=1}^{n}{X_i}$; $n_Y = \sum_{i=1}^{n}{Y_i}$
$D-stat = max(F(X) - F(Y))$
$D-cri = c(\alpha)\sqrt(\frac{n_X+n_Y}{n_X*n_Y})$
Hypothesis check: if D-Stat > D-Crit H0 will be rejected
95% significance level, alpha 0.05, $c(\alpha)$ = 1.3581

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-03-12 19:34:44

This process is also described on the English Wikipedia.

Construct CDFs:

Sort. \begin{align*} X&: (4, 7, 9, 12, 21, 23, 23, 24, 28) \\ Y&: (1, 5, 7, 13, 13, 18, 20, 20, 25) \end{align*}
Construct CDFs. These should be your Cum F(...)s \begin{align*} CDF(X) &= \begin{cases} 0 & \phantom{4\leq{}} x <4 \\ \frac{1}{9} & 4\leq x<7 \\ \frac{2}{9} & 7\leq x<9 \\ \frac{1}{3} & 9\leq x<12 \\ \frac{4}{9} & 12\leq x<21 \\ \frac{5}{9} & 21\leq x<23 \\ \frac{7}{9} & 23\leq x<24 \\ \frac{8}{9} & 24\leq x<28 \\ 1 & 28 \leq x \end{cases} \\ CDF(Y) &= \begin{cases} 0 & \phantom{1\leq{}}x < 1 \\ \frac{1}{9} & 1\leq x<5 \\ \frac{2}{9} & 5\leq x<7 \\ \frac{1}{3} & 7\leq x<13 \\ \frac{5}{9} & 13\leq x<18 \\ \frac{2}{3} & 18\leq x<20 \\ \frac{8}{9} & 20\leq x<25 \\ 1 & 25 \leq x \end{cases} \end{align*} Let's plot these.
Now we compute $|\mathrm{CDF}(X) - \mathrm{CDF}(Y)|$, marking the global maximum. $$ |\mathrm{CDF}(X) - \mathrm{CDF}(Y)| = \begin{cases} 0 & \phantom{1\leq{}}x < 1 \\ \frac{1}{9} & 1\leq x<4 \\ 0 & 4\leq x<5 \\ \frac{1}{9} & 5\leq x<9 \\ 0 & 9\leq x<12 \\ \frac{1}{9} & 12\leq x<18 \\ \frac{2}{9} & 18\leq x<20 \\ \frac{4}{9} \ast & 20\leq x<21 \\ \frac{1}{3} & 21\leq x<23 \\ \frac{1}{9} & 23\leq x<24 \\ 0 & 24\leq x<25 \\ \frac{1}{9} & 25\leq x<28 \\ 0 & 28\leq x \end{cases} $$
So your test statistic is $4/9 = 0.\overline{4}$. As you have calculated, the critical value at the $\alpha = 0.05$ level is $0.64021{\dots}$. Since the test statistic is less than the critical value, the null hypothesis (that the two samples are drawn from the same distribution) is not rejected.
$p$-values are typically either provided by software or found in tables. For example, KolmogorovSmirnovTest[] in Mathematica 11.3 finds the $p$-value for this test statistic for samples of sizes $(9,9)$ is $0.27396{\dots}$. The R {stats} package implements the test and $p$-value computation in ks.test. Python's SciPy implements these calculations as scipy.stats.ks_2samp(). There is even an Excel implementation called KS2TEST.

Two-Sample Kolmogorov-Smirnov Test

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in HYPOTHESIS-TESTING

Trending Questions

Popular # Hahtags

Popular Questions