Understanding Tukey's test results for a one-factor ANOVA

Question

Understanding Tukey's test results for a one-factor ANOVA

2k Views Asked by Bumbble Comm At 10 May 2026 - 7:35

I performed ANOVA on a set of data which includes 6 groups (called 101-106), each group has between 6 and 8 observations, and all values are negative. I used python for that task and got p value < 0.05 which tells me that the group's means are not equal. Now I would like to know which group is different from which. Therefore I used Tukey's test (with python) which resulted in the following summary table:

    group1  group2  meandiff    lower   upper   reject
0   101     102     0.2917    -0.0425   0.6259  False
1   101     103     0.1571    -0.1649   0.4792  False
2   101     104     -0.1333   -0.4675   0.2009  False
3   101     105     0.0833    -0.2509   0.4175  False
4   101     106     -0.0500   -0.3626   0.2626  False
5   102     103     -0.1345   -0.4566   0.1875  False
6   102     104     -0.4250   -0.7592  -0.0908  True
7   102     105     -0.2083   -0.5425   0.1259  False
8   102     106     -0.3417   -0.6543  -0.0290  True
9   103     104     -0.2905   -0.6125   0.0316  False
10  103     105     -0.0738   -0.3959   0.2482  False
11  103     106     -0.2071   -0.5067   0.0924  False
12  104     105     0.2167    -0.1175   0.5509  False
13  104     106     0.0833    -0.2293   0.3960  False
14  105     106     -0.1333   -0.4460   0.1793  False

If the reject column says True we reject the null hypothesis and the means are NOT equal, if the reject column says False we accept the null hypothesis and the means are equal. As you can see, the result is a bit weird, for example group 101 is not different from the other groups, which cannot be true since it most be different from at least 1 group according to the ANOVA result. Also, group 102 and 104 are different, but they are both similar to group 103 which does not make any sense. Am I missing something? I used this method (and syntax) in the past and it worked fine.

Groups:

101: -1.45, -1.35, -1.6, -1.6, -1.65, -1.65

102: -1.5, -1.4, -1.15, -1.1, -1.25, -1.15

103: -1.5, -1.6, -1.525, -1.125, -1.2, -1.5, -1.3

104: -1.9, -1.55, -1.55, -1.7, -1.95, -1.45

105: -1.55, -1.65, -1.5, -1.3, -1.3, -1.5

106: -2 -1.4 -1.8 -1.75 -1.15 -1.7 -1.45 -1.55

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Thanks for the data which I entered into Minitab, thinking it is a good idea to compare output of what ought to be standard procedures between the two statistical packages.

One-way ANOVA: G1, G2, G3, G4, G5, G6 

Method

Null hypothesis         All means are equal
Alternative hypothesis  At least one mean is different
Significance level      α = 0.05

Equal variances were assumed for the analysis.


Factor Information

Factor  Levels  Values
Factor       6  G1, G2, G3, G4, G5, G6


Analysis of Variance

Source  DF  Adj SS   Adj MS  F-Value  P-Value
Factor   5  0.7331  0.14662     4.00    0.006
Error   33  1.2096  0.03666
Total   38  1.9427


Model Summary

       S    R-sq  R-sq(adj)  R-sq(pred)
0.191457  37.73%     28.30%      14.60%

.

Means

Factor  N     Mean   StDev        95% CI
G1      6  -1.5500  0.1225  (-1.7090, -1.3910)
G2      6  -1.2583  0.1594  (-1.4174, -1.0993)
G3      7  -1.3929  0.1830  (-1.5401, -1.2456)
G4      6  -1.6833  0.2041  (-1.8424, -1.5243)
G5      6  -1.4667  0.1402  (-1.6257, -1.3076)
G6      8  -1.6000  0.2673  (-1.7377, -1.4623)

Pooled StDev = 0.191457

The largest differences in means are between G2 and G4 and between G2 and G6. Because the F-test is significant at the 1% level (P-value 0.006), it reasonable to say that at least the largest difference (btw G2 and G4) must be statistically significant. It is not fair to make many direct comparisons among the 95% CIs in the output and figure above, because error probabilities may proliferate in making many comparisons. However, in view of the information above it should not be surprising the Python's Tukey procedure chooses the two largest differences as significant.

Below is Minitab's version of the Tukey procedure.

Tukey Pairwise Comparisons 

Grouping Information Using the Tukey Method and 95% Confidence

Factor  N     Mean  Grouping
G2      6  -1.2583  A
G3      7  -1.3929  A B
G5      6  -1.4667  A B
G1      6  -1.5500  A B
G6      8  -1.6000    B
G4      6  -1.6833    B

Means that do not share a letter are significantly different.

Thus we see that the Tukey procedure chooses exactly the two largest differences among pairs of group sample means as significant with a 'family' significance level of 5%. The Python and Minitab outputs are not in conflict.

I think it is possible that your confusion may have arisen from looking at differences among means instead of the group means themselves. Furthermore, if you are going to look at differences among means, you have to consider absolute rather than signed differences. The biggest absolute differences are declared significant by both Python and Minitab.

Understanding Tukey's test results for a one-factor ANOVA

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in VARIANCE

Related Questions in MEANS

Trending Questions

Popular # Hahtags

Popular Questions