Let's say I construct two lists, $A$ and $B$, each containing $N$ pairs of values.
For $A$, the $i$th pair of values, $(A_{i,1}, A_{i,2})$, consists of two samples from some arbitrary probability distribution. This distribution is not necessarily the same for each pair. (This means that $A_{i,1}$ and $A_{j,1}$ are NOT sampled from the same distribution)
For $B$, the $i$th pair of values, $(B_{i,1}, B_{i,2})$, consists of one sample each from two arbitrary probability distributions.
If I gave you two lists constructed in this way, could you tell which is which?
Is the fact, that the values $A_{i,1}$ and $A_{i,2}$ come from the same distribution (thus, "correlated" in a way) and that $B_{i,1}$ and $B_{i,2}$ do not, sufficient to distinguish the two lists even for extremely large values of $N$?
What information at minimum is required to distinguish two lists constructed in this way as $N \to \infty$?
If you have enough observations, and if the two distributions are sufficiently different, then it should not be difficult to distinguish between the A's and B's.
For both A and B take differences of the pairs. $A_{1i}$ and $A_{2i}$ come from the same distribution, so that the differences $D_{ai}$ should be consistently small.
By contrast $B_{1i}$ and $B_{2i}$ may, at random be from different distributions, so differences $D_{bi}$ will be a mixture of large and small, and hence have a larger variance.
A variance test on the $D_a$ vs. $D_b$ should detect the difference in variances.
Your idea of looking at correlations also seems feasible.
However, I don't understand the questions about sample size. I don't see how variances become more alike as sample size increases. I ran my code with $n=2000$ instead of $n=20.$ The P-value of
var.testchanged from nearly $0$ to an output of just $0,$ which probably means a P-value small enough to cause underflow.And your idea of correlation also works fine with larger samples:
Notes: (1) My only (and lame) reason for not comparing correlations with a formal test is I didn't want to have to figure out how to do it in R. (2) A Welch t test can't tell the difference between
daanddbwith either sample size.