How to interpret data from Mann-Whitney U Test

385 Views Asked by At

So I am doing a research project and I was told to do the Mann Whitney $U$ Test. The research is examining male and female work experiences by them ranking statements from 1-6 (1 being strongly agree and 7 being strongly disagree). The goal is to see if their is a difference in work experience between the two groups. Using an online calculator these were my results: Key: 1st group/2nd group

Variable: Female/Male Observations: 720/936
Mean: 3.099/3.042 SD: 1.715/1.668

Mann-Whitney test / Two-tailed test:

U 341978.5
U (standardized) 0.532
Expected value 336960
Variance (U) 89023289.87
p-value (Two-tailed) 0.595
alpha 0.05

So I have all this great data. The first part makes sense to me but the data in the second part I don't know how to read/analyze. Can somebody explain. Thank you!

2

There are 2 best solutions below

3
On

U = 341978.5

This is the value of the Mann-Whitney $U$ statistic, which is defined as $$U = \sum_{i=1}^n \sum_{j=1}^m S(X_i, Y_j), \\ S(X, Y) = \begin{cases}1, & Y < X \\ 1/2, & Y = X \\ 0, & Y > X. \end{cases}$$ Here, $X_i$ represents the response of the $i^{\rm th}$ person in the first group, which are women, and $Y_j$ is the response of the $j^{\rm th}$ person in the second group, which are men. However, this definition could have $X$ and $Y$ reversed, so you might want to double check.

U (standardized) 0.532

This is the standardized value of the $U$-statistic, equal to $$\frac{U - \mu_U}{\sigma_U},$$ where $\mu_U$ and $\sigma_U$ are given below. This standardized value is approximately normally distributed with mean $0$ and variance $1$, so $0.532$ is a $z$-score; the more extreme it is, the more evidence there is to support the hypothesis that the two groups differ in their responses.

Expected value 336960

This is $\mu_U = mn/2$, where $m = 936$ and $n = 720$.

Variance (U) 89023289.87

This is the sample variance of the $U$ statistic, $$\sigma_U^2 = \frac{mn}{12} \left( (m+n-1) - \sum_{i=1}^k \frac{t_i^3 - t_i}{(m+n)(m+n-1)} \right)$$ where $t_i$ is the number of people sharing rank $i$, where your ranks range from $1$ to $k = 7$.

p-value (Two-tailed) 0.595

This is the conditional probability that, given there is no difference between the two groups, you would obtain a sample that is at least as extreme as the one you observed. It is a measure of the plausibility of the data you observed, assuming the null is true. Therefore, the smaller this value, the more evidence there is to favor rejecting the null hypothesis.

alpha 0.05

This is the predefined significance level of the test and is the maximum Type I error you are willing to accept--i.e., you wish to limit the probability of incorrectly rejecting the null hypothesis to be at most $5\%$. Since the $p$-value exceeds $\alpha$, you do not reject $H_0$ and your conclusion is that the data furnishes insufficient evidence to suggest the two groups responded differently to the survey.

0
On

Comment continued. Here is fictitious Likert-7 data sampled in R. They may be sufficiently similar to your real data for a useful demonstration. [Vectors p in the random sampling procedure sample give theoretical relative proportions of scores 1 through 7; they need not sum to 1.]

set.seed(414)
x1 = sample(1:7, 720, repl=T, p = c(2,2,4,3,2,2,1))
x2 = sample(1:7, 936, repl=T, p = c(1,3,4,3,2,1,1))

Tabulation and description of data:

table(x1)
x1
  1   2   3   4   5   6   7 
 79 114 168 121  97  91  50 
summary(x1)
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 1.000   2.000   3.000   3.717   5.000   7.000 

table(x2)
x2
  1   2   3   4   5   6   7 
 59 193 255 188 108  69  64 
summary(x2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   3.000   3.594   5.000   7.000 

Notice that the minimums, lower quartiles, medians, upper quartiles, and maximums of the two data vectors are the same. So it is no surprise that the boxplots look the same.

boxplot(x1, x2, horizontal=T, col="skyblue2")

enter image description here

Nevertheless, the sample means are a little different so it is worthwhile looking at a Wilcoxon rank sum test to see if the two groups express significantly different opinions. However, as for your real data, a formal test reveals no significant difference at the 5% level: P-value greater than 0.05 = 5%.

wilcox.test(x1, x2, cor=F)

        Wilcoxon rank sum test

data:  x1 and x2
W = 350600, p-value = 0.1504
alternative hypothesis: 
  true location shift is not equal to 0

For datasets as large as these and with many ties, the P-value is found using a normal approximation to the distribution of the Wilcoxon test statistic $W$.

For the Wilcoxon test, the mean and SD of the test statistic are somewhat different from those @Heropup showed you for the M-W $U$-test, but the conclusion is the same. [A simple linear transformation gets you from $W$ to $U$ and the parameters of the approximating normal distributions are adjusted accordingly.]