Clarifying the assumptions about a paired t-test

177 Views Asked by At

I've wrote my question in red ink (see links). There are two questions that I have. Primarily I want to know why they concluded that "there is some evidence that there is some difference in mean circulating levels of androgens" even though the t-statistic 2.06 value falls within the no rejection region $-2.145 < t < 2.145,$ another reason for not rejecting the $H_0$ is because our p-value of 0.6 is greater than our significance level of 0.5. These two things agree with one another to not reject $H_0$ but they still concluded that there is some difference. Or maybe the reason why they said "there is SOME evidence that there is a difference in mean" is because the t-value of 2.06 is so close to the critical value of 2.145?

Another question I have pertains to to how they got the p-value of 0.6. It has that squiggly equal sign $\approx$ so it means roughly equal to 0.6. I've uploaded the t-distribution table. With $\nu=14$ and we're finding the probability of $T > 2.06.$ Now whatever we get, we need to multiply by 2 since we have a two-tailed test. I found that probability to be between 0.05 and 0.025, indicated with red horizontal and vertical lines. Half way between those two proportion is 0.0375, multiply be 2 (since two-tailed test), which equals to 0.075.

Everything is illustrated in both images.

Are my assumptions correct? Sorry it's a bit lengthy but I'm just really CURIOUS.

Click here to see the question

Click here to see the t-distribution table

1

There are 1 best solutions below

0
On

Verification. I have transcribed the differences $d_i$ in the linked document, and checked the descriptive statistics for myself in R statistical software. The results agree with the ones given in the document.

 d = c(4.26, -2.08, 2.76, 0.94, 1.11, 3.21, 7.31, 13.74,
       0.52, -2.45, -0.68, -0.16, 68.03, 26.55, 24.66)
 length(d); mean(d); sd(d)
 ## 15         # sample size
 ## 9.848      # sample mean
 ## 18.47363   # sample SD
 boxplot(d, horizontal=T, col="skyblue", pch=19)

enter image description here

Notice that there are two outliers in the data, one of them substantially away from the rest of the data.

A paired t-test on the original two values for each deer is essentially a one-sample t test on the differences $d_i$. Here are the results of such a test.

 t.test(d, alte="two")

 ##        One Sample t-test

 ## data:  d 
 ## t = 2.0646, df = 14, p-value = 0.058
 ## alternative hypothesis: true mean is not equal to 0 
 ## 95 percent confidence interval:
 ##  -0.3823535 20.0783535 
 ## sample estimates:
 ## mean of x 
 ##     9.848 

The values $T = 2.0646,\; df = 14,$ and p-value $\approx 0.06.$ are the same as those given in the document. So it seems everything is 'as advertised'. Now on to the specific issues raised in your question.

The p-value. Generally speaking, exact p-values cannot be determined from printed tables of Student's t distribution. Here is what can be determined from the printed table I have at hand. Looking on the row for $df = \nu = n-1 = 14$ of the table, I see that $T = 2.0646$ is most closely bracketed by printed values 1.753 and 2.131. Looking at the top margin of the table, I see that these values cut probabilities 0.05 and 0.025, respectively, from the upper tail of the distribution $T(14).$ Thus for our two-sided test, we can say that the the p-value is bracketed by $2(0.05) = 0.10$ and $2(0.025) = 0.05.$ Certainly, $0.06$ is in that interval, but that is all we can say for sure.

Software usually reports more nearly exact p-values, so the printout above shows $0.058 \approx 0.06$. An exact computation based on the $T$-statistic, would be as shown below.

 2*(1 - pt(2.0646, 14))
 ## 0.05800116

Interpretation. I cannot say what was in the authors' minds when they say that there is "some evidence" that the injections raised the androgen levels of the deer. I can only give you my opinion of the data and the conclusions.

I wonder about two things. First, if the experimenters gave the injections with the purpose of $raising$ (as opposed to $altering$) androgen levels, then they should be using a $one- sided\; test.$ In that case, any value $T > 1.753$ would lead to rejection of the null hypothesis at the 5% level. Also, for this one-sided test the p-value would be $0.29 \approx .03$ which is less than 5% and would lead to rejection at the 5% level.

Strictly speaking, they should not change from a two-sided test to a one-sided test $after$ seeing the data. But I would wonder about the interpretation on these grounds, and I guess they are also having second thoughts.

Second, on account of the outliers noted above, I would wonder if a t test is the right one to use. Or put more technically, I would wonder whether the $data\, are\, normal$ and thus whether the $T$-statistic really has Student's t distribution with $\nu = 14.$

There are alternative tests: (a) A nonparametric Wilcoxon signed-rank test on the differences $d_i$ gives the p-value $0.012 < 0.05$ for a two-sided test and the p-value $0.006$ for a one-sided test. [Note: Here 'nonparametric' means the the data are not assumed to be normal. A Shapiro-Wilk test for normality, soundly rejects the null hypothesis that the differences come from a normal distribution: p-value $8.8 \times 10^{-5}.$]

(b) Another appropriate test might be a permutation test, which also avoids making the assumption that the data are normal. I have not done such a test, but I suspect that it would also show a significant difference in androgen levels after the injections.

The researchers do not have a right to 'shop around' for a test that happens to lead to rejection. However, in these circumstances, I would question the validity of the t test. And, assuming androgen levels in deer are important, I would be reluctant to abandon this line of investigation just because the t test 'barely' failed to reject at the 5% level. There is no Law of the Universe that the 5% level is the one correct dividing line between significance and non-significance.

Your Question raises important issues of statistical inference. Maybe this is enough to satisfy your curiosity (for now) about the interpretation of t tests.