I need to statistically analyze some simple data for a paper, but I am not sure the best way to do so.

29 Views Asked by At

I have some simple data for a paper. It is in the format as follows:

Name  | Numerical Quality One | Numerical Quality Two | Yes or No? | Yes or No? |

Bob   |                   4.5 |                     3 |        Yes |         No |
Jenny |                     5 |                     2 |          - |         No |
Steve |                     3 |                     5 |        Yes |        Yes | 

And so on for ~18 people or so. The numerical qualities are in a set range between 1 and 5, and are not necessarily correlated. (They are the answers to a question where the format is 1 for strongly disagree, and 5 for strongly agree). Some people neglected to answer some questions, in which case I have a dash "-" for their data on that question.

My professor expects a detailed analysis on this data. However, I don't know what kind of chart or graph to give him, besides the average and median for each question. I haven't taken statistics in several years and I don't know how best to analyze it.

I have some freedom in my analysis so any suggestion for a type of graph/chart/math which I can apply to analyze this data would be appreciated.

Thank you.

1

There are 1 best solutions below

0
On

You have some work to do, deciding what you want to know from your data. Here are some examples of possible tests.

Wilcoxon test for Likert data:

x1 = c(2,3,3,4,3,2, 3,5,5,3,2,3, 1,2,2,4,5,4) 
x2 = c(2,2,2,3,1,2, 2,5,4,1,3,1, 2,1,1,2,3,2)

For many of the 18 subjects, scores on the second question seem less favorable than scores on the first.

A Wilcoxon test finds significantly greater scores on the first than on the second; the P-value about $0.002 < 0.5 = 5\%$ so you might reject the null hypothesis that the first question scores higher at the 5% level. However, the Wilcoxon test does not perform well when there are so many tied values, so some people might quibble with this conclusion.

wilcox.test(x1,x2, alt="greater", pair=T)

      Wilcoxon signed rank test with continuity correction

data:  x1 and x2
V = 110, p-value = 0.001844
alternative hypothesis: true location shift is greater than 0

Warning messages:
1: In wilcox.test.default(x1, x2, alt = "greater", pair = T) :
   cannot compute exact p-value with ties
2: In wilcox.test.default(x1, x2, alt = "greater", pair = T) :
   cannot compute exact p-value with zeroes

d = x1 - x2;  d
[1]  0  1  1  1  2  0  1  0  1  2 -1  2 -1  1  1  2  2  2
sum(d > 0);  sum(d < 0)
[1] 13
[1] 2

However, among the 15 subjects who answered differently on the two questions, only 2 gave higher scores and 13 gave lower scores. A sign test finds that only two higher scores on the second question, rejects the null hypothesis that the questions have equivalent responses against the alternative that the the first question got more favorable answers: P-value about 0.003.

dbinom(2, 15, .5)
[1] 0.003204346

The answer above is from R. Minitab has a formal sign test procedure; Minitab output is shown below (using a normal approximation). The P-value is nearly the same:

Sign test of median =  0.00000 versus > 0.00000

      N  Below  Equal  Above       P  Median
Dif  18      2      3     13  0.0037   1.000

Chi-squared test for Yes/No data.

                  Question 2
Question 1         Yes    N0        Total
-----------------------------------------
   Yes               4     7          11
   No                5     2           7
----------------------------------------
Total                9     9          18


TBL = rbind(c(4,7), c(5, 2));  TBL
      [,1] [,2]
[1,]    4    7
[2,]    5    2

A chi-squared test finds no significant pattern of association (positive or negative) between Yes and No answers on the two questions.

chisq.test(TBL)

    Pearson's Chi-squared test 
    with Yates' continuity correction

data:  TBL
X-squared = 0.93506, df = 1, p-value = 0.3336

Warning message:
In chisq.test(TBL) : 
 Chi-squared approximation may be incorrect

Because of the small counts, there is a warning message that the P-value may not be correct. However, the implementation of chisq/test in R allows for simulating a more accurate P-value, as below. However, the test still does not find a significant association between answers to the two questions. [Another alternative when counts are small would be Fisher's Exact Test, not illustrated here.]

 chisq.test(TBL, sim=T)

    Pearson's Chi-squared test 
    with simulated p-value 
    (based on 2000 replicates)

 data:  TBL
 X-squared = 2.1039, df = NA, p-value = 0.3343