I've wrote my question in red ink (see links). There are two questions that I have. Primarily I want to know why they concluded that "there is some evidence that there is some difference in mean circulating levels of androgens" even though the t-statistic 2.06 value falls within the no rejection region $-2.145 < t < 2.145,$ another reason for not rejecting the $H_0$ is because our p-value of 0.6 is greater than our significance level of 0.5. These two things agree with one another to not reject $H_0$ but they still concluded that there is some difference. Or maybe the reason why they said "there is SOME evidence that there is a difference in mean" is because the t-value of 2.06 is so close to the critical value of 2.145?
Another question I have pertains to to how they got the p-value of 0.6. It has that squiggly equal sign $\approx$ so it means roughly equal to 0.6. I've uploaded the t-distribution table. With $\nu=14$ and we're finding the probability of $T > 2.06.$ Now whatever we get, we need to multiply by 2 since we have a two-tailed test. I found that probability to be between 0.05 and 0.025, indicated with red horizontal and vertical lines. Half way between those two proportion is 0.0375, multiply be 2 (since two-tailed test), which equals to 0.075.
Everything is illustrated in both images.
Are my assumptions correct? Sorry it's a bit lengthy but I'm just really CURIOUS.
Verification. I have transcribed the differences $d_i$ in the linked document, and checked the descriptive statistics for myself in R statistical software. The results agree with the ones given in the document.
Notice that there are two outliers in the data, one of them substantially away from the rest of the data.
A paired t-test on the original two values for each deer is essentially a one-sample t test on the differences $d_i$. Here are the results of such a test.
The values $T = 2.0646,\; df = 14,$ and p-value $\approx 0.06.$ are the same as those given in the document. So it seems everything is 'as advertised'. Now on to the specific issues raised in your question.
The p-value. Generally speaking, exact p-values cannot be determined from printed tables of Student's t distribution. Here is what can be determined from the printed table I have at hand. Looking on the row for $df = \nu = n-1 = 14$ of the table, I see that $T = 2.0646$ is most closely bracketed by printed values 1.753 and 2.131. Looking at the top margin of the table, I see that these values cut probabilities 0.05 and 0.025, respectively, from the upper tail of the distribution $T(14).$ Thus for our two-sided test, we can say that the the p-value is bracketed by $2(0.05) = 0.10$ and $2(0.025) = 0.05.$ Certainly, $0.06$ is in that interval, but that is all we can say for sure.
Software usually reports more nearly exact p-values, so the printout above shows $0.058 \approx 0.06$. An exact computation based on the $T$-statistic, would be as shown below.
Interpretation. I cannot say what was in the authors' minds when they say that there is "some evidence" that the injections raised the androgen levels of the deer. I can only give you my opinion of the data and the conclusions.
I wonder about two things. First, if the experimenters gave the injections with the purpose of $raising$ (as opposed to $altering$) androgen levels, then they should be using a $one- sided\; test.$ In that case, any value $T > 1.753$ would lead to rejection of the null hypothesis at the 5% level. Also, for this one-sided test the p-value would be $0.29 \approx .03$ which is less than 5% and would lead to rejection at the 5% level.
Strictly speaking, they should not change from a two-sided test to a one-sided test $after$ seeing the data. But I would wonder about the interpretation on these grounds, and I guess they are also having second thoughts.
Second, on account of the outliers noted above, I would wonder if a t test is the right one to use. Or put more technically, I would wonder whether the $data\, are\, normal$ and thus whether the $T$-statistic really has Student's t distribution with $\nu = 14.$
There are alternative tests: (a) A nonparametric Wilcoxon signed-rank test on the differences $d_i$ gives the p-value $0.012 < 0.05$ for a two-sided test and the p-value $0.006$ for a one-sided test. [Note: Here 'nonparametric' means the the data are not assumed to be normal. A Shapiro-Wilk test for normality, soundly rejects the null hypothesis that the differences come from a normal distribution: p-value $8.8 \times 10^{-5}.$]
(b) Another appropriate test might be a permutation test, which also avoids making the assumption that the data are normal. I have not done such a test, but I suspect that it would also show a significant difference in androgen levels after the injections.
The researchers do not have a right to 'shop around' for a test that happens to lead to rejection. However, in these circumstances, I would question the validity of the t test. And, assuming androgen levels in deer are important, I would be reluctant to abandon this line of investigation just because the t test 'barely' failed to reject at the 5% level. There is no Law of the Universe that the 5% level is the one correct dividing line between significance and non-significance.
Your Question raises important issues of statistical inference. Maybe this is enough to satisfy your curiosity (for now) about the interpretation of t tests.