Alternatives to Kruskal-Wallis or one way Anova test for small size samples

182 Views Asked by At

I have a group of measurements 'grouped by year' and on each year I have only one recorded measure, such as the example below:

measure           | group 

0.918013116736037 |2008
0.87700851127606  |2009
0.865305712564198 |2010
0.885404924545329 |2011
0.91228191516954  |2012
0.868529155787641 |2013
0.844761039845612 |2014
0.892400123529577 |2015

I want to use a statistical test to say wether there is a 'significant' different between these measures or not.. I know that I can use anova or Kruskal-Wallis test for such problems but due to the small size of samples per groups I don't think it will be sufficient !

I also cannot generate more data as the measures that I am using are actually a classification accuracy on the data collected on each group 'each year' ! which mean repeating the classification on the same sample will be like cheating !

any suggestions ?

1

There are 1 best solutions below

3
On BEST ANSWER

You cannot do any meaningful hypothesis test to see if years are significantly different with only one observation per year. You would need some way to judge whether variability among years is surprisingly large compared to variability within years. But with only one observation per year, you have no way to estimate variability within years. (To put it another way, there is no such thing as "groups" of size 1 in such tests.)

Instead, I suggest you focus on the variability among years in a a few different ways.

First, you might look at the standard deviation of the yearly measurements. It is roughly 0.02. Knowing what you do about these measurements and what they mean, do you think this is a large or small amount of variability? (The mean of your data is about 0.88, Roughly speaking, there should be few instances in which any one year is beyond three standard deviations of the mean in either direction: $0.88 \pm 0.06.$ Does that seem reasonable based on what you know about these measurements?)

Second, I rounded your observations to three places (for convenience) and made a boxplot of them--including the usual feature of boxplots that notes outliers. There are no outliers here.

enter image description here

Third, I also plotted your observations in sequence. There is no clear trend of decrease or increase across the years for which you have data. (With so few observations, it is very dangerous to speculate about cycles, but there may be a hint of down-up-down-up cycles in your data. With more years of data you might eventually be in a better position to look for meaningful cycles.)

enter image description here

Note: If it is possible to go back to the original data and find monthly (or quarterly) accuracy scores. Then you would have 12 (or 4) observations per year and you could do a Kruskal-Wallis or some other test to see if there is a significant difference from year to year.