I have different samples, some from the same distribution. Let's say I have 6 samples, $S_1$ to $S_6$, where $\{S_1,S_2,S_3\}$, $\{S_4,S_5\}$,$\{S_6\}$ are groups of samples from the same distribution. What method can I use to group samples as above?
I know I can use Kolmogorov-Smirnov two sample test, so my main idea is to compare any two possible combinations and group those with a similarity higher than a certain threshold (lets say $0.8$).
Pseudo code:
for X=S1:S6
for Y=S1:S6 and Y!=X:
stats.ks_2samp(X,Y)
if p>0.8:
group(X,Y)
Question 1: As I read in this comment, it seems like the $p$-value does not show the probability that two samples come from the same distribution. Is this true?
Question 2: Given $n_1$ and $n_2$ as samples sizes, can this formula be useful, i.e. $$\frac{n_1n_2}{n1 + n2}\ge4\,?$$
The asymptotic $p$-value becomes very accurate for large sample sizes, and is believed to be reasonably accurate for sample sizes $n_1$ and $n_2$, such that $$\frac{n_1n_2}{n1 + n2}\ge4.$$
source: http://www.mathworks.com/help/stats/kstest2.html?s_tid=gn_loc_drop#outputarg_p
Question 3: Is there another way to perform this task?
Update from this source
The $p$-value is the answer to this question:
If the two samples were randomly sampled from identical populations, what is the probability that the two cumulative frequency distributions would be as far apart as observed? More precisely, what is the chance that the value of the Komogorov-Smirnov $D$-statistic would be as large or larger than observed?
The above statement is in contrast with comment I mentioned above, so I suppose the statement in comment was wrong.