Check for distribution of the sample with unknown parameters using ks.test in R.

Question

Check for distribution of the sample with unknown parameters using ks.test in R.

242 Views Asked by Bumbble Comm At 28 Mar 2026 - 3:56

When I do a ks.test in R for a sample to check from which distribution it is, it gives me a $p$ value less than 0.01 for various distributions and I don't know why. Maybe because of parameters or smth? Also, I have a dataset in r with two columns (samples) and the ks.test even gives an output for the whole dataset ( when I write ks.test(x = data,...). Anyway I don't know how to correct the issue so that the test really shows from which distribution the data is drawn. Almost for every distribution the p value is given as much less than 0.01.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-05-27 22:03:36

I have no idea what data you are using or how you are choosing their supposed population distributions. Maybe a demonstration where the Kolmogorov-Smirnov test in R does work will be helpful.

Suppose use R to take a sample of size $n = 100$ from a population known to be $\mathsf{Norm}(mu = 100, \sigma = 15).$ Then I use the K-S test to see if the data match their parent distribution. Then the K-S test does not reject the null hypothesis that the data are distributed as $\mathsf{Norm}(mu = 100, \sigma = 15).$

set.seed(527)
x = rnorm(100, 100, 15)
ks.test(x, "pnorm", 100, 15)

        One-sample Kolmogorov-Smirnov test

data:  x
D = 0.10263, p-value = 0.2428
alternative hypothesis: two-sided

Now suppose I generate a random sample of size 200 from $\mathsf{Exp}(\lambda = 0.01),$ which has mean 100 and standard deviation 100. Then I do a K-S test to see if the data are consistent with a normal distribution with $\mu = \sigma = 100.$

set.seed(2020)
y = rexp(200, 0.01)
ks.test(y, "pnorm", 100, 100)

        One-sample Kolmogorov-Smirnov test

data:  y
D = 0.15885, p-value = 8.266e-05
alternative hypothesis: two-sided

The null hypothesis that the data are from $\mathsf{Norm}(100,100)$ is rejected with a tiny P-value. The mean and variance are correct, but the K-S test detects that the shape of the distribution is wrong. However, a K-S test that the data are from $\mathsf{Exp}(0.01)$ is not rejected:

ks.test(y, "pexp", .01)$p.val
[1] 0.3032855

The K-S test works by comparing the empirical CDF (ECDF) of the sample with the CDF of the hypothetical distribution. An ECDF of continuous data is a step-function that increases by $1/n$ at each (sorted) observation.

In the plot below, the heavy dots show the ECDF of the sample. The heavy blue curve is the CDF of $\mathsf{Exp}(0.01),$ which matches the ECDF pretty well. But the CDF of $\mathsf{Norm}(100,100)$ (broken red curve) is a poor match.

plot(ecdf(y))
  curve(pexp(x,.01), add=T, col="blue", lwd=2)
  curve(pnorm(x,100,100), add=T, col="red", lwd=2)

The $D$-statistic of the K-S test is the largest vertical distance between the sample ECDF and the hypothetical CDF.

Check for distribution of the sample with unknown parameters using ks.test in R.

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in HYPOTHESIS-TESTING

Trending Questions

Popular # Hahtags

Popular Questions