ANOVA analysis to compare mean values

Question

ANOVA analysis to compare mean values

44 Views Asked by Bumbble Comm At 28 Mar 2026 - 11:11

According to my findings, we can use Anova analysis to compare a set of mean values. ANOVA depends on 3 main assumptions; Normality, Homogeneity of variance, Independent observations.

According to central limit theorem, when the sample size is large, mean(x) has a normal distribution, even though the distribution of x is not normal.

My question is, can we use ANOVA analysis to compare means, even if the original distributions of each data set is not normal and size of each data set is greater than 1000?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2018-08-08 17:26:00

Shifted-exponential data: Here is a demonstration for particular datasets, showing that ANOVA can have relatively poor power distinguishing among samples of size 1000 from slightly-shifted exponential distributions, all with population SD $\sigma = 1.$

I'm not saying an ANOVA never works on exponential data; I am saying that there is good reason for the normality assumption. (The distribution of the F-statistic under $H_0$ is not as expected unless data are normal.)

set.seed(1888); x = rexp(3000)
d = rep((1:3)/20, each=1000); x = s + d  # shift by 1/20, 2/20,3/20
g=as.factor(d*20)

Sample means are slightly different:

mean(x[1:1000]); mean(x[1001:2000]); mean(x[2001:3000])
[1] 1.088035
[1] 1.089778
[1] 1.166204

ANOVA not significant:

anova(lm(x~g))
Analysis of Variance Table

Response: x
            Df Sum Sq Mean Sq F value Pr(>F)
g            2    4.0  1.9924  1.8201 0.1622
Residuals 2997 3280.6  1.0946

Kruskal-Wallis detects different shifts:

kruskal.test(x~g)

        Kruskal-Wallis rank sum test

data:  x by g
Kruskal-Wallis chi-squared = 7.4152, df = 2, p-value = 0.02454

The boxplots at the left below shows the three shifted-exponential samples, each of size 1000.

par(mfrow=c(1,2))
boxplot(x~g, col="skyblue2")

Shifted-normal data: By contrast both tests detect similar shifts in normal populations.

set.seed(1888); x = rnorm(3000);  d = rep((1:3)/20, each=1000);  
g=as.factor(d*20); x =x+d
anova(lm(x~g))
Analysis of Variance Table

Response: x
            Df  Sum Sq Mean Sq F value  Pr(>F)  
g            2    8.27  4.1346  4.1808 0.01538 *
Residuals 2997 2963.87  0.9889                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

kruskal.test(x~g)

        Kruskal-Wallis rank sum test

data:  x by g
Kruskal-Wallis chi-squared = 8.7825, df = 2, p-value = 0.01239

boxplot(x~g, col="skyblue2")
par(mfrow=c(1,1))

Boxplots at right below show the three normal samples.

ANOVA analysis to compare mean values

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in NORMAL-DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions