A set of jobs (with potentially different runtimes) arrive on a cluster at different times. These are scheduled by different schedulers in different experiments (arrival times and runtimes are exactly same in each experiment). Depending on the schedule, jobs may wait for resources, get preempted and restarted. Under each schedule ($B$), for each job ($j$) we get a completion time ($JCT(j , B)$) as $finish\_time(j, B) - arrival\_time(j)$. (A job's runtime is its ideal JCT, i.e., no waiting/preemptions/restarts. Note, arrival time does not depend on schedule).
We compute %reduction in job's completion time from schedule W to B as (read B as better and W as worse): $\left(1 - \frac{JCT(j, B)}{JCT(j, W)}\right) * 100$. %reduction allows us to compare impact on jobs with different runtimes as opposed to absolute change in JCT (i.e., $JCT(j, W) - JCT(j, B)$).
What is the correct way to aggregate %reduction across jobs. We want to use the aggregates to gauge mainly 2 things:
- Evaluate a particular scheduler as we change its parameters or the workload's parameters (e.g., arrival rate, runtime distribution).
- Consider the absolute value of the aggregate to compare different schedulers.
Choices I can think of:
- $ArithmetricMean_j\left(\left(1 - \frac{JCT(j, B)}{JCT(j, W)}\right) * 100\right)$. This does not seem meaningful: say we have job1 with %reduction as 90% (10x speedup, i.e., $\frac{JCT(1, B)}{JCT(1, W)}=\frac{1}{10}$) and job2 with %reduction as -100% (2x slowdown, i.e., $\frac{JCT(2, B)}{JCT(2, W)}=\frac{2}{1}$), then this aggregate would be -5% where as the improvement in job1's JCT is much more than degradation in job2's JCT, so I would expect the aggregate to show net improvement.
- $\left(1 - GeometricMean_j\left(\frac{JCT(j, B)}{JCT(j, W)}\right)\right) * 100$. This seems the most appropriate, but I don't know how to derive it. This would correlate with mean speedup, i.e., $GeometricMean_j\left(\frac{JCT(j, W)}{JCT(j, B)}\right)$.
- $\left(1 - \frac{ArithmeticMean_j(JCT(j, B))}{ArithmeticMean_j(JCT(j, W))}\right) * 100$. Basically %reduction in mean JCT. This perhaps loses information about jobs with different runtimes, i.e., improvements made on short jobs might be overshadowed by improvements made on long jobs.