Testing hypothesis that means are same using t-test and confidence intervals give different results

533 Views Asked by At

I have two samples and I want to test the null hypothesis that the means of the two samples are the same at a 95% level of confidence interval.

When I use a t-test my p value is 0.023 and so I reject the null hypothesis that the means are the same, and conclude there is a significant difference between the means.

However when I calculate the 95% confidence intervals of each sample individually the confidence intervals overlap, which suggests to me that we do not have enough evidence to reject the null hypothesis and conclude that the means are different.

Is it possible to get different conclusions using these two methods, and if so which one should I trust more? Or have I done something wrong somewhere?

Thanks in advance

2

There are 2 best solutions below

1
On BEST ANSWER

The reason was I was assuming equal variance in the t test, whereas the confidence intervals each used their own variance (one of which was larger than the other because of a smaller sample size).

Running a Welch two sample t-test (not assuming different variance) agreed with the conclusion that there is not enough evidence to reject the null hypothesis

0
On

First issue: "Inconsistency" between results from one t-test and two confidence intervals:

Suppose you look at the 95% confidence interval (CI) that goes with a t-test. It is a CI for the difference in two population means. Associated with this 95% confidence, we might say there is a 5% chance of error. If you look at separate 95% CIs for the two individual means, then you have 5% chance of error for EACH interval. These two don't exactly add to a 10% chance of error, but certainly it is a different procedure to look at one CI for the difference and two separate CIs for individual means. So you shouldn't be surprised if you get different answers--especially not in this case which seems to be a borderline call anyhow.

Second issue: "Inconsistency" between pooled and Welch t-tests.

If the populations from which the two groups of data were drawn are not the same, then the pooled t-test can very well give an misleading P-value (or decision whether to reject the null hypothesis and judge the two populations to be different). [Note: You don't say anything about sample sizes, but this problem with pooled t-tests can be especially serious of the two sample sizes are markedly different.]

In my experience, reading of the theory, and looking at results of simulation studies, I have come to the conclusion that in statistical practice one should always do the Welch test. (If the two populations have unequal variances, the Welch test is always better; and even if the two populations happen to be nearly equal, it is difficult to see how the Welch test can be worse. One of life's easy decisions always to use Welch.) The pooled test takes less arithmetic to compute, but nowadays with computers the extra computation for the Welch test is not noticeable.