Hi we are trying to determine the statistical significance of our summer water quality survey. 72 water samples were taken in two consecutive summer seasons. That is three sampling sites per week and three samples per site for eight weeks.
number of samples = 3*3*8
Each sample is cultured in a petri dish (chromogenic media that produces different colors per bacteria species) there are five colors that can be indentified.
A positive result is a 'dot' of one of the five colors. In one sample you can have all five colors or none. The number of dots depends on the quantity of bacteria present in the sample and the color determines the bacteria type.
If W is the family of colors and $w$ the quantity observed and X is a collection of three samples then:
$x_{i}$ = {$w_{a}, w_{b}, w_{c}, w_{d}, w_{e}$} is the possible outcome of one sample where ${w}$ is >= 0.
For each week the total number per color group per site is averaged, the average of each color group for one site for one week:
($\frac{1}{n}\sum_{i=1}^3 w_{ai}, \frac{1}{n}\sum_{i=1}^3 w_{bi}, \frac{1}{n}\sum_{i=1}^3 w_{ci} , \frac{1}{n}\sum_{i=1}^3 w_{di},\frac{1}{n}\sum_{i=1}^3 w_{ei}$)
What is the probability that one color group will have the greatest average value all three sites in the same week?
I was assuming it would look like this :
x = $\frac{1}{n}\sum_{i=1}^3 w_{ai}$
Probability that x > than all other $x_{j}$ at one site = 1/5
Probability that this repeats at all three sites in the same week = (1/5)³ = .008
Probability that this happens twice = .00064
Therefore this is an unlikely outcome and statistically significant.
For full details check here: http://mwshovel.pythonanywhere.com/dirt/microbiology.html
This is the chart of year over year results:
If all five colors are equally likely, and if the chance of a tie for "largest number" is negligible, and if the results at each site have independent probabilities, then in any given week there is indeed a $(1/5)^3$ chance that color A has the most dots in all three sites.
In the same week, there is also a $(1/5)^3$ chance that color B has the most dots. And the same chances for colors C, D, and E.
In total, there is a $5 \times (1/5)^3 = 0.04$ chance that some single color will have the most dots at all three sites in a given week.
But there were many weeks of sampling, so the chance that this would happen at least once is much greater than $0.04.$
For example, looking just at the 2016 data, again assuming the probabilities are independent, the chance that there will be a single color with the most dots at all three sites in at least one of the eight weeks is $$ 1 - (0.96)^8 \approx 0.28. $$ So the fact that this did happen once is not particularly significant.
On the other hand, if you look at both years, and ignore the "UV E. coli" data from 2017 (which were not recorded in 2016), you have three times when one color had the most dots at all three sites in one week. Under the same probability assumptions as before, this is the chance that a binomial variable with $n=16$ and $p = 0.04$ has a value greater than $2.$ That probability is approximately $0.024.$