Can significance check be applied and which one?

76 Views Asked by At

Collective wisdom is desperately needed! I need to understand if some kind of significance testing is applicable here, and if that is the case - which test.

The data collection was devised as follows: There are 3 stations where people perform a number of station-specific manual tasks.

On each of three test days different people were assigned to stations (9 people in total). people were of different height, age and weight.

Each person (at each station on each day) was subjected to 5000 measurements of his kinetic activity, later classified as "good" or "bad". The percentages of "good" and "bad" for each person (and day and station) are thus known.

3 people from day one are treated as a "control group", or baseline.

I would like to know if a significance test is possible for percentages @ station[x] and day one (and baseline person working at that station on day one) and percentages @ same station for a different day/person.

say station 1 was a "drill and rotate". On day one, John worked there alone and had 80% good drill-and-rotate movements and 20% bad ones (out of 5000 measured) On day two, Jill worked there, had ongoing egonomic instructions and had 85% and 15% respectively.

Null hypothesis is (I am guessing) "the difference can be attributed to natural variation in humans only" I want to test the significance of "had ergonomic instructions".

Can it be done with such a setup and which formula? Or is it a faulty experiment design and such testing is impossible?

Please help. I am no statistician and I am at my wits' end.

3

There are 3 best solutions below

6
On

Not nearly enough information for a definitive answer:

Days. Are there designed differences among the three Days (that carry across stations)? [You hint that John (Day 1) was somehow treated differently than Jill (Day 2). What about Day 3?]

Stations. Are there designed differences among the three Stations (that carry across days)? [You hint that this may be so. Station 1 is 'drill and rotate'. What about Stations 2 and 3?] ]

Tentative model. From what (little) I can gather from your description, I guess that this may be a two-factor ANOVA (3 Stations $\times$ 3 Days.) with one observation per 'cell'. The model might be $$Y_{ij} + \mu + \alpha_i + \beta_j + e_{ij},$$ for $i = 1,2,3$ days and $j = 1,2,3$ stations, where independently $e_{ij} \sim \mathsf{Norm}(0, \sigma).$

The ANOVA table would have three rows: Days, Stations, Error. Because there is only one observation per cell, there would be no interaction term.

Data. Data would be fractions Good out of 5000. Although technically binomial proportions, i guess my first try at analysis would be to treat data as normal (as presaged by the tentative model) because the number of trials per subject is so large. I would certainly want to check whether the nine residuals seem consistent with normal.

Contrast. It seems that Day 1 ('treated as a control') might be treated differently from Days 2 and 3. If that is so, and the Day-effect is significant, you might want to test whether the designed contrast (comparing Day 1 against the other two), based on the coefficient vector $c=(1, -.5, -.5)$ is significant.

Design flaws. Without seeing the data or understanding the nature of the effects under study, I'm guessing that it would have been better to have five times as many subjects, each doing 1000 trials. Unless effects are profoundly large or variability much smaller than is usual using human subjects. I suppose that one subject in each of the $3 \times 3$ cells will not provide enough power to find significant differences. (Even twice as many subjects, 2500 trials, 2 observations per cell would have been a lot better.)

Are any apparent differences among subjects due to personal differences, or are they due to different performances of the same subject across time? With this design we'll never know. Did anyone stop to think that three randomly chosen subjects are hardly enough for a 'baseline' of any kind?

The time to think about analyzing data from a study is before the study is done. It is regrettable if so much effort has been expended with no clear model or strategy for analysis in mind.

0
On
stations    day1        day2        day3
1           74.03%      74.83%      75.44%
2           45.83%      80.24%      76.45%
3           65.82%      72.55%      72.73%

percentages are proportions of "good" movements @ corresponding day and station. people working stations on day 1 weren't assisted by anything, people working stations on day 2 were given instructions, people working on day 3 were given visual aids.

Question: did the "ergonomic interference" influence proportion of "good" movements?

0
On

Here is output from Minitab 17 based on the model in my previous answer. Data were entered as percents, just as you provided them.

P-values for Day and Station effects are both above 5%. This indicates that no significant differences were found among Days or among Stations.

Differences between %-Good scores for individuals fluctuate so much, apparently at random, that it is not possible to discern systematic differences among Days or Stations above that 'noise'.

ANOVA: Good versus Day, Station 

Factor   Type   Levels  Values
Day      fixed       3  1, 2, 3
Station  fixed       3  1, 2, 3


Analysis of Variance for Good

Source   DF        SS        MS     F      P
Day       2  0.036492  0.018246  1.93  0.260
Station   2  0.008025  0.004012  0.42  0.681
Error     4  0.037910  0.009478
Total     8  0.082427


S = 0.0973528   R-Sq = 54.01%   R-Sq(adj) = 8.02%

A normal plot of the residuals from the model shows that the nine residuals fall very nearly in a straight line. That is, residuals are consistent with normality.

enter image description here

So that you can check the data I used, here is a table. Margins are row and column averages.

Rows: Day   Columns: Station

            1       2       3     All

1      0.7403  0.4583  0.6582  0.6189
            1       1       1       3

2      0.7483  0.8024  0.7255  0.7587
            1       1       1       3

3      0.7544  0.7645  0.7273  0.7487
            1       1       1       3

All    0.7477  0.6751  0.7037  0.7088
            3       3       3       9

Cell Contents:  Good  :  Mean
                         Count

Note: Individual 95% confidence intervals for your percentages (based on 5000 trials each) are about $\pm$1.5%. There are ${9 \choose 2} = 36$ pairs of individuals. I certainly don't encourage you to check all pairs for significant differences, because by chance alone you would likely find several 'significant' differences that way. However, if there is a one specific pair of subjects that was of particular interest (before you saw data), you might look at that one.

For example, if the difference between John and Jill (mentioned individually in your Question) is of particular importance, then we can test to find that they gave significantly different proportions of Good responses (P-value very small):

Test and CI for Two Proportions 

 Sample     X     N  Sample p
 1       4000  5000  0.800000
 2       4250  5000  0.850000

 Difference = p (1) - p (2)
 Estimate for difference:  -0.05
 95% CI for difference:  (-0.0648622, -0.0351378)
 Test for difference = 0 (vs ≠ 0):  Z = -6.59  P-Value = 0.000

BTW: I don't find John and Jill in the data you provided most recently.