Kolmogorov-Smirnov two-sample test

Question

Kolmogorov-Smirnov two-sample test

3.5k Views Asked by Bumbble Comm At 25 Apr 2026 - 4:40

I want to test if two samples are drawn from the same distribution. I generated two random arrays and used a python function to derive the KS statistic $D$ and the two-tailed p-value $P$:

>>> import numpy as np
>>> from scipy import stats
>>> a=np.random.random_integers(1,9,4)
>>> a
array([3, 7, 4, 3])
>>> b=np.random.random_integers(1,9,5)
>>> b
array([2, 2, 3, 7, 9])
>>> stats.ks_2samp(a,b)
(0.40000000000000002, 0.75428850089034016)

From the documentation of http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.ks_2samp.html I know that

$$D=0.40000000000000002$$ and $$P=0.75428850089034016$$ So the probability that the two samples are drawn from the same distribution is $\sim75\%$.

Now my question is what does $D$ tell me? And is there a simple way to calculate these two values by hand?

The wikipedia article does not have a simple example with two samples, that is why I am trying finally to find an answer here.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

One rejects the null hypothesis when the P-value is small. A common criterion is to reject if the P-values is less than 0.05.

In a Kolmogorov-Smirnov test, the D-statistic measures the maximum diagonal distance between the empirical cumulative distribution functions (ECDFs) of the two samples. (Everything is re-scaled so the ECDF fits inside the unit square.)

An ECDF is made by sorting the data and plotting it along the horizontal axis. Then the ECDF is a non-decreasing stair-step function that rises by 1/n at each of the n sorted data points. An ECDF is intended to approximate the cumulative distribution function (CDF) of the probability distribution from which the data were randomly sampled.

It is often difficult to distinguish between two distributions with small amounts of data. So it might be more revealing if you generated your fake experimental data with larger sample sizes.

Below is a session in R, in which x and y come from the same distribution and z comes from a different distribution. I show K-S tests to compare x and y and to compare x and z.

 x = rnorm(100, 50, 2);  y = rnorm(100, 50, 2);  z = rnorm(100, 65, 3)
 ks.test(x,y)

 #        Two-sample Kolmogorov-Smirnov test

 # data:  x and y 
 # D = 0.11, p-value = 0.5806  # Huge P-value, don't reject
 # alternative hypothesis: two.sided 

 ks.test(x,z)

 #        Two-sample Kolmogorov-Smirnov test

 # data:  x and z 
 # D = 1, p-value < 2.2e-16  # tiny P-value, so reject
 # alternative hypothesis: two.sided

Kolmogorov-Smirnov two-sample test

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in RANDOM-VARIABLES

Related Questions in NORMAL-DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions