Calculating statistical significance

43 Views Asked by At

I have a dataset that holds the number of times a group of people used a type word in a survey at three different data collection points:

People in Group - 158

People who use a word at T0 = 123

People who used a word at T1 = 65

People who used a word at T2 = 54

Does anyone know what types of statistical analysis I can do in order to work out the statistical significance of the drop from T0 to T2?

(T0 is before an intervention, T1 directly after and T2 3 months later)

1

There are 1 best solutions below

1
On

As I mentioned in a Comment, it would be best to have results for individual subjects in order to do an analysis that uses all the information available. (If you intend to publish your results in a journal of quality, referees may insist on that.)

However, it seems that an analysis of your data in its present form, based on fairly simple confidence intervals, is sufficient to show differences among your three time periods.

Initial survey. The fraction of users stereotype words on the first test is $p_1 =123/158 = 0.778.$ Based on that, a 95% Agresti confidence interval for the true population proportion is $(0.772, 0.836).$

Final Survey. Similarly, a 95% Agresti confidence interval for the true population proportion based in $p_3 = 54/158 = 0.342$ is $(0.346, 0.419).$ The two confidence intervals are very far from overlapping. So there is there is good evidence that the fraction of subjects using stereotype words has decreased.

Agresti confidence intervals. In general, here is how to make an Agresti confidence interval (CI) if there are $X$ events in $n$ trials: Let $\tilde p = \frac{X+2}{n+4}.$ Then the CI is of the form $$\tilde p \pm 1.96\sqrt{\tilde p(1-\tilde p)/(n+4)}.$$

Bonferroni comparisons. For a 98.3% Agresti CI, the numerator of $\tilde p$ is $X + 2.264$ and the denominator is $n + 4.53.$ Then the interval is of the form $$\tilde p \pm 2.12\sqrt{\tilde p(1 - \tilde p)/(n+4.53)}.$$

If you are going to compare three confidence intervals and want an overall error probability of 5%, then you should use 98.3% CIs. Of course, 98.3% CIs will be a little longer than 95% CIs, but I think not enough longer to cause intervals for your three time periods to overlap. (This is known as the Bonferroni method.)

If you want more on the 'Agresti` and 'Bonferroni' methods, you can look in a recent intermediate level applied statistics text or google the names. Also, for more on Agresti CIs see this page.