Kruskal Wallis - Effect size

2.7k Views Asked by At

I analyse 4 algorithms and 3 sets of metrics for each algorithm in which I apply the non-parametric Kruskal-Wallis test for each metric to detect any differences in performance between these algorithms.

I would like to know whether there is a way to calculate the effect size when applying the Kruskal-Wallis test.

As mentioned in other posts in CV, a post-hoc analysis for Kruskal-Wallis should use the Dunn's test and not the Mann-Witney test for pairwise comparisons between groups (algorithms).

By applying the "inaccurate" MW test, I can calculate the effect size, but what can I do if I apply Dunn's test?

Thanks in advance for any comment/advice.

PS: I posted this question to CV some time ago, but I didn't receive any reply yet. Hence, I post it in this forum too.

1

There are 1 best solutions below

8
On

First, Dunn-Bonferroni tests are to be done only if the Kruskal-Wallis test indicates there are some differences among the groups. Let's suppose you are testing at the 5% level. Once we know there is some pattern of differences discuss, we can try to discover what that pattern is by making pairwise comparisons among the $g$ treatment groups.

There are $c = C(g, 2) = g(g-1)/2$ possible paired comparisons. If we make each of these at the 5% level, there is a possibility that the 'grand' or 'family' error probability for a pattern of differences emerging from the paired comparisons will be substantially more than 5%.

A Bonferroni procedure is based on the Bonferroni inequality of probability theory. In your application, the idea is that if we test all $c$ comparisons at the level $.05/c$, then the family error probability of the pattern cannot exceed 5%.

So if you have $g = 4$ groups, then $c = 6$ and you should make each multiple comparison at level $.05/6 = 0.0083.$ You could use six 2-sample Wilcoxon tests at that level. or you could look at six Wilcoxon confidence intervals for differences in medians at confidence level $1 - 0.0083$ or 99.2%.

There is no guarantee that the pattern will be absolutely clear. For example, if we have three levels 1, 2, and 3 in increasing order of sample medians, it is possible you might find a clear difference between extremes 1 and 3, but not be able to resolve whether 2 is significantly different from either 1 or 3.

Caveats: I wonder what you mean that you have 4 algorithms and 3 sets of metrics. From that description, I have no idea what your experimental design looks like. Are you doing three separate Kruskal-Wallis tests, one for each 'set of metrics'? If so, what I have said above is OK, with $g = 4.$

Or do you have a two factor design, in which one factor is 'algorithm' and the other is 'metric'? In that case I don't see how a Kruskal-Wallis test can give an appropriate analysis.

If you want to say what the the factors in your design are and how many replications for each factor (or combination thereof), maybe my advice would be different.

Also, I'm wondering what kind of data you have that causes you to think in terms of nonparametric tests.