Check for statistical significance in this scientific work.

52 Views Asked by At

I will have a few minutes to present a poster in a scientific meeting happening in some days.

I gathered some data regarding a particurar medical procedure and outcomes.

It's reported that outcomes vary proportionally with the age of the patient (the oldest have the worst prognosis).

I would like to check my database for statistical significance but a few hours of reseach brought me nowhere. Any hint is highly appreciated. 0=good outcome; 1= bad outcome.

AGE OUTCOME
80  0
32  1
21  1
69  0
62  0
76  0
74  0
56  0
78  0
58  1
45  0
50  1
36  0
63  0
86  0
53  0
60  0
69  0
71  0
90  1
76  0
76  0
80  1
81  0
75  0
78  0
77  0
56  1
44  0
20  1
77  0
45  0
59  0
19  0
82  1
42  1
62  0
1

There are 1 best solutions below

2
On BEST ANSWER

You should first check that the data is normally distributed in each of the outcome groups. A Normal Q-Q Plot of group $0$ indicates violation of normality

enter image description here

and a Shapiro-Wilk Test confirms that

> shapiro.test(df$AGE[df$OUTCOME==0])

    Shapiro-Wilk normality test

data:  df$AGE[df$OUTCOME == 0]
W = 0.89618, p-value = 0.01095

Since the data is not normally distributed, we will continue with a nonparametric test of our choice. The Wilcoxon Rank Sum Test seems like an appropriate choice.

Let

$$H_0:\mu_1 =\mu_2$$

$$H_a : \mu_1 \neq \mu_2$$

    Wilcoxon rank sum test with continuity correction

data:  AGE by OUTCOME
W = 169, p-value = 0.2516
alternative hypothesis: true location shift is not equal to 0

so we do not have significant evidence that the group means differ.

In fact, the median age of outcome $1$ is lower than in group $2$ which contradicts your claim.

> median(data$AGE[data$OUTCOME==1])
[1] 53
> median(data$AGE[data$OUTCOME==0])
[1] 69