Which statistical test to use? Is the Wilcoxon Signed-Rank Test Appropriate?

97 Views Asked by At

I have a dataset of about 20 points where I have the actual measurement for a tool and the measurements given by a second tool. Neither is normally distributed. I want to prove that the measurements given by the second tool are wildly different than those of the first tool (the accepted correct tool). The difference is apparent just by looking at the data/percent difference between the two. I dont know which test to use to do this? I am thinking the Wilcox sign ranked test. I ran it and got a p-value of 0.0007972, which is what I need. Is this test appropriate though?

Thanks in advance

2

There are 2 best solutions below

0
On BEST ANSWER

If the data are continuous (so that tied observations are nonexistent or rare) and the 20 observations from the first tool are paired with those for second tool (same randomly chosen individuals have scores from both tools, then the Wilcoxon signed-rank test is appropriate.

However, to be fussy with language, you cannot "prove" that the two tools are "wildly different." You can reject the null hypothesis that median differences of scores between the two tools for randomly chosen individuals from your population of interest are zero.

Computing the test statistic begins by taking the difference $D_i = X_i - Y_i$ for each subject $i,$ where $X_i$ is the subjects's score by Tool 1 and $Y_i$ is the same subject's score by Tool 2.

If the Wilcoxon signed rank statistic gives a P-value of $0.001$ you can say that the population median difference is "significantly different" in a statistical sense from 0 at the $0.1\%$ level. That is, if median scores were equal such an extreme result would occur only once in a thousand such 20-subject experiments.

Notes: (a) If data are not essentially continuous, then there may be ties among the $D_i$'s. This makes it difficult to find the P-value. Ties interfere with ranking in a way that makes the traditional theory for finding P-values difficult to use, expecially if the proportion of ties is relatively large. (In that case, you may need to see if a simulated permutation test can provide a reliable P-value.)

(b) It is possible for two tools to give different values, even if both tools are useful. For example, if $Y_i = 2X_i + 7$ then the $D_i$ are not likely to have 'nearly' a 0 median. But it is easy to convert from values by one tool to values by the other, so both may be equally useful.

(c) If you have 20 subjects measured by Tool 1 and 20 different subjects measured by Tool 2, then you do not have 'paired' data. However, it may be possible to use a (two-sample) Mann-Whitney-Wilcoxon rank sum test.

0
On

The signed rank test will tell you if one variate tends to be larger than the other, or smaller than the other. However, "wildly different" suggests something else. For example, if the second was far too high on low measurements and far too low on high measurements, that could be "wildly different" but would not tend to be detected by a signed rank test (because it could be too low about half the time and too high about half the time)