I’ve been playing with the Wilcoxon signed-rank test and come with a curious result that hopefully someone can explain.
My understanding is that differentiating a sequence of independent random numbers simply doubles the variance, so $N(0,\sqrt{2})$ is equivalent $\textrm{diff }N(0,1)$. In Python terms the underlying distributions of np.random.randn(1000)*np.sqrt(2) and np.diff(np.random.randn(1001) are equivalent.
However, I get very different effects when I compare each distribution to $N(0,1)$. Because the means are equivalent, the null hypothesis is true and I would expect the p values to be uniformly distributed between 0 and 1.
This is the case when I compare $N(0,1)$ and $N(0,\sqrt{2})$. With 10,000 tests, around 100 false positives are triggered for p<0.01, as expected.
import numpy as np
import matplotlib.pyplot as plt
num_trials = 10000
wilcoxon_p = [stats.wilcoxon(np.random.randn(1000)*np.sqrt(2),
np.random.randn(1000)).pvalue for _ in xrange(num_trials)]
plt.figure()
plt.hist(wilcoxon_p, bins=100)
plt.title('Two Normal Distributions')
plt.xlabel('P value')
The p-value is uniformly distributed comparing N(0,1) and N(0,sqrt(2)
However, when $N(0,1)$ and $\textrm{diff }N(0,1)$ are compared, no false positives are triggered and the distribution of p values is very different.
num_trials = 10000
wilcoxon_p = [stats.wilcoxon(np.diff(np.random.randn(1001)),
np.random.randn(1000)).pvalue for _ in xrange(num_trials)]
plt.figure()
plt.hist(wilcoxon_p, bins=100)
plt.title('A normal and differentiated normal distribution.')
plt.xlabel('P value')
The P values appear linearly distributed between zero and one comparing N(0,1) and diff(N(0,1)).
I don’t understand what can be causing this effect. I get similar results when using MATLAB, so I expect the effect to be mathematical, not programming language specific. It doesn’t appear to be simply an issue with converting between the test statistic and p values, as that should be comparable between the two systems.
If it helps at all, I observed a similar effect in the Mann-Whitney U test.
Can someone explain where this difference comes from?
Found the answer to my own question after digging around. I had violated the assumption that each sample is independent of other samples. $\textrm{diff }N(0,1)$ is identically distributed to $N(0,\sqrt{2})$, but its samples are not independent, violating the assumptions of the test.
The expected behavior was seen when sampling every other difference (and thus, avoiding the correlation between adjacent samples). In pythonic form,
np.random.randn(1000)*np.sqrt(2)is equivalent tonp.diff(np.random.randn(2001))[0:-1:2].