Wilcoxon ranked sign test and the pseudomedian

465 Views Asked by At

I need help understanding the Wilcoxon signed rank test and the pseudomedian. More concisely about the confidence interval of the pseudomedian.

Say I have the observations $-2,3,6,10,20$.

I will find the pseudomedian for which I first need the Walsh averages $\frac{X_{i}+X_{j}}{2}$:

$-2,0.5,2,3,4,4.5,6,6.5,8,9,10,11.5,13,15,20$

From here it's clear that the pseudomedian is 6.5 (the measurement with equal distance from both ends). How can I find the confidence at $95\%$ intervals for this pseudomedian?

1

There are 1 best solutions below

2
On

With only $n = 5$ observations, you are right about the Walsh averages and the pseudomedian. Also, the (only approximately) 95% CI from the Wilcoxon signed rank test is $(-2, 20).$ In R:

x = c(−2,3,6,10,20)
wilcox.test(x, conf.int=T)

        Wilcoxon signed rank test

data:  x
V = 14, p-value = 0.125
alternative hypothesis: 
  true location is not equal to 0
95 percent confidence interval:
 -2 20
sample estimates:
(pseudo)median 
           6.5 

It is not clear whether you are more interested in the theory and computations for Wilcoxon's signed rank test, or whether you are more interested in making sense of an actual small dataset. If the latter, please provide actual data.

For the sample you give in your question, one kind of a 95% nonparametric bootstrap CI is $(-8,14),$ which may be more useful than the CI from the Wilcoxon SR test.

set.seed(128)
h.obs = median(x); h.obs
[1] 6
d.re = replicate(5000, median(sample(x, 6, rep=T)) - h.obs)
UL = quantile(d.re, c(.975,.025))
h.obs - UL
97.5%  2.5% 
-8    14 

Five observations is hardly enough for a reliable bootstrap CI for the median and I would not bet much on 95% coverage probability of the resulting CI. [Among the $B = 5000$ re-samples, there were only five uniquely different values of d.re.]

A 95% nonparametric bootstrap CI for the population mean is $(0.4, 13.4).$ For various reasons, I have a little more faith in this CI. One of the reasons is that your sample of five observations is hardly symmetrical.

x = c(−2,3,6,10,20)
set.seed(128)
a.obs = mean(x); a.obs
[1] 7.4
d.re = replicate(10000, mean(sample(x, 5, rep=T)) - a.obs)
UL = quantile(d.re, c(.975,.025))
a.obs - UL
97.5%  2.5% 
  0.4  13.4