correctness for minimizing average completition time for scheduling problem with release times

787 Views Asked by At

I'm self-studying CLRS (Intro to Algorithms 3ed), and am working on the last part of problem 16-2. I've constructed an algorithm that seems to be correct, but am not sure how to prove the correctness rigorously.

Here's a summary: We are given a set $S = {a_1, a_2\ldots a_n}$ of tasks, where task $a_i$ requires $p_i > 0$ processing time to complete, and cannot start until on or after its release time $r_i$. We also allow pre-emption, so that a task that is running can be suspended and restarted at a later time, so that a task's running time can be divided into multiple "pieces".

We want to schedule the tasks to minimize the total completion time. For example, if we start task $a_i$ at time $r_i + 1$ and keep running it until it finishes, then its completion time would be $c_i = r_i + 1 + p_i$. Alternatively, we can run the task at starting at $r_i + 0.5$ for duration $p_i / 2$, run another task $a_j$ for duration $m$, then come back and finish the other half of $a_i$ for a completion time of $c_i = r_i + 0.5 + m + p_i$.

An earlier part of the same problem had the specific case $r_i = 0$, e.g. all activities release at the same time, for which the solution is to simply sort the activities by $p_i$ and then run them in non-decreasing order, so that shortest tasks are completed first. It's pretty straightforward to prove that this is correct, since the total completion time is given by a double sum of $p_i$, and we can show that if there's an inversion such that $p_i > p_j$ and we schedule $a_i$ before $a_j$, then we can decrease the total completion time by swapping the two.

For the the general $r_i$ case, a greedy algorithm that immediate comes to mind is to always run the task with the smallest remaining processing time, chosen from among the set of incomplete tasks that have already been released. I have a pretty strong hunch that this will give the optimal solution, but I'm not sure how to prove it. The presence of all the different release times and the ability to split up processing times into chunks greatly muddles the ability to have a straightforward proof like the inversion method above.

1

There are 1 best solutions below

0
On

I found a pretty nice proof. Here's the basic sketch. Suppose we current have a set $X$ of already-released, but incomplete tasks. Let $p_k$ be the smallest processing time in $X$, and $p_i > p_k$ be some other processing time in $X$.

Suppose we choose to run and complete $p_i$ before we start running $p_k$. Suppose with this choice, we proceed to solve the entire problem, and obtain a total completion time score of $C^\prime$, with activity $i$ completing at $c^\prime_i$ and activity $k$ completing at $c^\prime_k > c^\prime_i$.

Now, consider the set of intervals in which $p_i$ was scheduled. This set may consist of either one continuous interval, or multiple intervals due to the ability to temporarily suspend tasks. Whichever one it is, due to our choice we know that the intervals in which $p_k$ is running are located after those of $p_i$. Now swap the entire $p_k$ interval with the first portion, of the same length, of the $p_i$ interval, keeping absolutely everything else fixed. Now we have activity $i$ completing at $c_i = c^\prime_k$, while activity $k$ completing at $c_k \leq c^\prime_i - (p_i - p_k)$, with equality occurring if and only if the last $p_i - p_k$ of the original $p_i$ interval is in one unbroken piece. Since the swap is the only change we made and every other activity is fixed, we have the completion time $C \leq C^\prime - (p_i - p_k)$. Hence, with a swap, we have decreased the total completion time.

Hence, the only way to minimize the completion time is to always choose to run the activity with the smallest remaining processing time, so that no such swap is possible, and $C$ is truly minimized.