Is difference between two sets of measurements significant?

1.6k Views Asked by At

Consider the following experimental setting: I have two machines $m_0$ and $m_1$ of which I would like to know which one performs better. For this I have set up an experiment to measure the time it takes a machine to perform a certain task. I have run the same experiment 100 times and have logged the time measurements, e.g. $t_0 = [0.1, 0.3, 0.2,...]$ for machine $m_0$ and $t_1 = [0.4, 0.1, 0.2]$ for machine $m_1$.

I learned in statistics about the $H_0$ and $H_1$ hypothesis and statistical significance, but somehow the hypotheses there were always given (as something the researcher assumed about the real world a priori) and I am actually not really able to apply what I learned there to my setting.

How can I check that $m_1$ is performing -in statistical terms- significantly better than $m_0$, for instance? What would I choose my hypotheses like? How would I perform the hypothesis test?

1

There are 1 best solutions below

3
On BEST ANSWER

This is a good case for a nonparametric Wilcoxon Rank-Sum Test . Ill outline what you can do:

  1. Your null hypothesis is that there is no difference between the expected performance time: $H_0: E[t_1]=E[t_2]$ the alternative is that machine 1 has a lower median performance time: $H_a:E[t_1]<E[t_2]$. Thus, under the null hypothesis, both sets of times came from the same distribution. This justifies the next step.
  2. Rank the times in the combined dataset, then calculate the sum of the ranks associated with the machine 1 times. This is your rank sum statistic $W$.
  3. Now, since you took 100 samples, the distribution of $W$ under the null hypothesis is likely to be close to normal. Therefore, we will use the large-sample approximation to the distribution of W: $W\;\;\dot{\sim}\;\;\mathcal{N}\left(\frac{n(m+n+1)}{2},\frac{mn(m+n+1)}{12}\right)$
  4. Standardize W to get $W^*=\frac{W-n(m+n+1)/2}{\sqrt{mn(m+n+1)/12}}$
  5. Check if $W^*$ is lower than the lower-tailed z-test cutoff of $-z_{\alpha}$, where $\alpha$ is your level of confidence (typically $0.05$ or $0.01$). If it is, then machine performs better than machine 2 at a statistical significance level of $\alpha$