How to group similarity samples?

68 Views Asked by Bumbble Comm At 09 Apr 2026 - 12:00

I have different samples, some from the same distribution. Let's say I have 6 samples, $S_1$ to $S_6$, where $\{S_1,S_2,S_3\}$, $\{S_4,S_5\}$,$\{S_6\}$ are groups of samples from the same distribution. What method can I use to group samples as above?

I know I can use Kolmogorov-Smirnov two sample test, so my main idea is to compare any two possible combinations and group those with a similarity higher than a certain threshold (lets say $0.8$).

Pseudo code:

for X=S1:S6
    for Y=S1:S6 and Y!=X:
        stats.ks_2samp(X,Y)
        if p>0.8:
            group(X,Y)

Question 1: As I read in this comment, it seems like the $p$-value does not show the probability that two samples come from the same distribution. Is this true?

Question 2: Given $n_1$ and $n_2$ as samples sizes, can this formula be useful, i.e. $$\frac{n_1n_2}{n1 + n2}\ge4\,?$$

The asymptotic $p$-value becomes very accurate for large sample sizes, and is believed to be reasonably accurate for sample sizes $n_1$ and $n_2$, such that $$\frac{n_1n_2}{n1 + n2}\ge4.$$

source: http://www.mathworks.com/help/stats/kstest2.html?s_tid=gn_loc_drop#outputarg_p

Question 3: Is there another way to perform this task?

Update from this source

The $p$-value is the answer to this question:

If the two samples were randomly sampled from identical populations, what is the probability that the two cumulative frequency distributions would be as far apart as observed? More precisely, what is the chance that the value of the Komogorov-Smirnov $D$-statistic would be as large or larger than observed?

The above statement is in contrast with comment I mentioned above, so I suppose the statement in comment was wrong.

Original Q&A

How to group similarity samples?

Related Questions in STATISTICS

Related Questions in RANDOM-VARIABLES

Related Questions in NORMAL-DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions