A county health department wants to know whether significant ground water contamination has occurred due to the use of a specific pesticide on farms in 5 different communities within the county. To do this, they obtain samples from 20 wells in each of these communities and determine the percentage of wells in each community that have significant pollution with this pesticide. To determine whether the communities differ in the frequency of polluted wells, what statistical test should be used?
A) paired t test B) one way ANOVA C) independent sample t test D) Chi-square test
I am choosing between in one way ANOVA and chi-square.
I know paired t test and independent t test are not the right choice.
Which one is the correct option?
Is it chi-squared test?
Thanks
The way you phrase the problem, you are comparing five communities, to see if they have the same proportions of contaminated wells (out of $n=20$.) Either a well is contaminated
Cor notN. Your data table could be displayed as follows, with counts adding to 20 under headersCandNin each row:This can be analyzed according to a chi-squared test of homogeneity (as suggested by @New-to-this). The chi-squared statistic $$Q = \sum_{i=1}^5 \sum_{j=1}^2 \frac{(X_{ij} - E_{ij})^2}{E_{ij}}$$ is approximately chi-squared with $\nu = 4$ degrees of freedom. The observed counts are $X_{ij}.$ Let the column totals be $T_j$, for $j = 1,2,$ with $T_1 + T_2 = 100.$ All of the row totals are $R_i = 20.$ The $E_{ij} = R_iT_j/100 = 20T_j/100 = T_j/5$ (not rounded to integers). [If either $T_j < 25.$ then the distribution of $Q$ may not be reliably approximated by the chi-squared distribution. Then simulation methods may be used to assess significance.]
You will reject the null hypothesis (at the 5% significance level) that all communities have the same level of contamination if the computed value $Q > 9.488.$ Rejection would mean that there are significant differences in contamination among communities.
In that case, further analysis would be required to see if there is an easy way to summarize the statistically significant differences. (For example, "Communities 2 and 5 have significantly worse contamination than the others.") There are 10 'components' in the sum for $Q.$ It is far from a formal test, but early clues to possible significant differences may be found among communities, if any, with components exceeding about 2.5.
If you do not reject the null hypothesis, then you cannot say that there are no important differences among communities--only that the available data do not reliably reflect significant differences.