Comparing the performance of two algorithms with Wilcoxon Signed-Rank Test

239 Views Asked by At

I have two algorithms A and B which have to compute a solution to some problem. Each solution is given some objective value which indicates the quality of the solution. I need to perform a Wilcoxon Signed-Rank Test to test whether there is any evidence that these two algorithms perform statistically significantly different from one another.

I have performed 12 trials of each algorithm and tabulated the objective values from solutions found during each trial. A smaller objective value is better.

 A   B
878 890
872 888
865 879
877 874
872 870
890 886
873 871
887 879
868 873
888 882
878 881

I am confused about a few details of performing this test.

  • Should I do a one-tailed or two-tailed test?

  • I'm not sure what my null hypothesis is. What should it be, given I want to find out whether algorithm A and B perform significantly different from one another?

  • If $\ p$-value $> 0.05$, what does this mean?

  • If $\ p$-value $< 0.05$, what does this mean?

1

There are 1 best solutions below

0
On

If your null hypothesis is that they have the same means and your alternative hypothesis is that the means differ, then you will regard $B$ being higher than $A$ as being as extreme a result as $A$ being higher than $B$. So you want a two-tailed test

The $p$ values is the probability that, if the null hypothesis is true, you see differences as extreme as or more extreme than the ones you actually did see. $p \lt 0.05$ has that probability less than $0.05$ so if the null hypothesis is true you would expect on average to see such results fewer than one in twenty times, while $p \gt 0.05$ is the opposite so if the null hypothesis is true you would expect on average to see such results more than one in twenty times, so the former may be a stronger indication from your observations that the null hypothesis may not be correct