Is it possible to tell if pairs of values are sampled from the same distribution?

Question

Is it possible to tell if pairs of values are sampled from the same distribution?

72 Views Asked by Bumbble Comm At 28 Mar 2026 - 1:07

Let's say I construct two lists, $A$ and $B$, each containing $N$ pairs of values.

For $A$, the $i$th pair of values, $(A_{i,1}, A_{i,2})$, consists of two samples from some arbitrary probability distribution. This distribution is not necessarily the same for each pair. (This means that $A_{i,1}$ and $A_{j,1}$ are NOT sampled from the same distribution)

For $B$, the $i$th pair of values, $(B_{i,1}, B_{i,2})$, consists of one sample each from two arbitrary probability distributions.

If I gave you two lists constructed in this way, could you tell which is which?

Is the fact, that the values $A_{i,1}$ and $A_{i,2}$ come from the same distribution (thus, "correlated" in a way) and that $B_{i,1}$ and $B_{i,2}$ do not, sufficient to distinguish the two lists even for extremely large values of $N$?

What information at minimum is required to distinguish two lists constructed in this way as $N \to \infty$?

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2020-06-04 02:21:00

If you have enough observations, and if the two distributions are sufficiently different, then it should not be difficult to distinguish between the A's and B's.

For both A and B take differences of the pairs. $A_{1i}$ and $A_{2i}$ come from the same distribution, so that the differences $D_{ai}$ should be consistently small.

By contrast $B_{1i}$ and $B_{2i}$ may, at random be from different distributions, so differences $D_{bi}$ will be a mixture of large and small, and hence have a larger variance.

A variance test on the $D_a$ vs. $D_b$ should detect the difference in variances.

set.seed(2020)
n = 20

mu.a = sample(c(10,50), n, rep=T)
a1 = rnorm(n, mu.a, 2)
a2 = rnorm(n, mu.a, 2)
da = a1-a2

mu.b1 = sample(c(10,50), n, rep=T)
mu.b2 = sample(c(10,50), n, rep=T)
b1 = rnorm(n, mu.b1, 2)
b2 = rnorm(n, mu.b2, 2)
db = b1-b2

var(da); var(db)
[1] 8.959197
[1] 595.5409

var.test(da,db)

    F test to compare two variances

data:  da and db
F = 0.015044, num df = 19, denom df = 19, 
  p-value = 3.547e-13
alternative hypothesis: 
  true ratio of variances is not equal to 1
95 percent confidence interval:
  0.005954518 0.038007415
sample estimates:
ratio of variances 
         0.0150438

Your idea of looking at correlations also seems feasible.

cor(a1, a2)
[1] 0.9903569
cor(b1, b2)
[1] 0.2256975

par(mfrow=c(1,2))
 plot(a1,a2, pch=20)
 plot(b1,b2, pch=20)
par(mfrow=c(1,1))

However, I don't understand the questions about sample size. I don't see how variances become more alike as sample size increases. I ran my code with $n=2000$ instead of $n=20.$ The P-value of var.test changed from nearly $0$ to an output of just $0,$ which probably means a P-value small enough to cause underflow.

And your idea of correlation also works fine with larger samples:

cor(a1,a2)
[1] 0.9902555
cor(b1,b2)
[1] 0.01700688

Notes: (1) My only (and lame) reason for not comparing correlations with a formal test is I didn't want to have to figure out how to do it in R. (2) A Welch t test can't tell the difference between da and db with either sample size.

**Bumbble Comm** · Answer 2 · 2020-06-04 03:57:22

No, this is not possible. Consider any two probability density functions $f_1,f_2$. Draw all values for $A$ with density $\frac12(f_1+f_2)$. For each $i$ and $j$, independently uniformly randomly choose $r_{i,j}\in\{1,2\}$, and sample $B_{i,j}$ from $f_{r_{i,j}}$. Then the test has no chance to distinguish the lists (even if $f_1$ and $f_2$ were known), since if you don’t know the $r_{i,j}$ the pairs have identical joint distributions.

Is it possible to tell if pairs of values are sampled from the same distribution?

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in SAMPLING

Trending Questions

Popular # Hahtags

Popular Questions