Why do t-tests on the same distribution can give me a p-value near zero?

270 Views Asked by At

I was expecting a t-test on two sets of data generated by the same distribution to pretty much have a p-value of 1. I was surprised to see it could be much, much lower. Can somebody please explain why?

My Python code to demonstrate looks like this:

import scipy.stats as stats
import numpy as np

Nsamp = 10 ** 6
sigma_x = 2
mean = 100
ps = []
for x in range(10):
    x = np.random.normal(mean, sigma_x, size=Nsamp)
    y = np.random.normal(mean, sigma_x, size=Nsamp)
    (t_value, p_value) = stats.ttest_ind(x, y, equal_var=True)
    ps.append(p_value)
p_mean = sum(ps) / len(ps)
print("p-values: Average {0:.3f} lowest {1:.3f}".format(p_mean, min(ps)))

And it's not unusual for me to see something like:

p-values: Average 0.553 lowest 0.088

EDIT: It's also worth mentioning that a chi-square test on the same data consistently returns a p-value of 1.0.

1

There are 1 best solutions below

3
On

What you're seeing is what you should expect to see.

When a point-null hypothesis is true and the statistic has a continuous distribution (and the assumptions of the procedure all hold), then p-values should have a standard uniform distribution. i.e. they are exactly as likely to be near 0 as they are to be near 1. That this is true follows immediately from the definition of a p-value.

For example here's the p-values from ten thousand (equal-variance-) two-sample t-tests where the data came from standard normal distributions ($n_1=n_2=10$):

histogram of 10000 p-values with true null. Appears to be uniform.

[As I said above, this uniformity is a direct consequence of the definition of the p-value.]

Under the alternative (if your test has power against the alternative), the p-values "crowd" down toward zero more but you can still observe occasional large p-values:

histogram of 10000 p-values with false null, delta/sigma = 0.2. Many more small p-values, still some large ones.

Here the difference in means was 1/5 of a standard deviation, but otherwise the conditions were as for the first histogram.