I'm writing a computer program to do work but there's a partitioning problem. In this program, there're workers and works. The main objectif is to give a balanced partition plan, so that works can be done as soon as possible.
x, number of work typesW, works to doP, number of partitions (workers) to do work, given by the user.
For W, the number of works, it is the sum of x different work types. Each work type has its own works :
W = W1 + W2 + W3 + ... + Wx
For P, it defines the number of partitions (workers) to do work. This value is given by the user, we can't change it. Notice that one partition is attached to a worker. Several workers can do the same type of work, but one worker can only do one type of work.
P = P1 + P2 + P3 + ... + Px
Example
For example, there're 4 apples and 7 bananas. The mission is to eat them all. There're 3 participants (workers). And we assume that eat a banana takes the same time as eating an apple, which is 1 minute / fuit. Since one worker can only do one type of work, each person can only eat one kind of fruit. In order to "balance" the speed, 4 apples will be given to person A, 4 bananas will be given to person B and 3 bananas will be given to person C. So
x = 2categories: apple, banana.W = W1 + W2 = 4 + 7 = 11works to doP = P1 + P2 = 1 + 2 = 3partitions to do work, where 1 partition for eating apples and 2 partitions for eating 7 bananas.
The above plan is computed based on the partition capacity :
avg = W / P = 11 work / 3 partition ≈ 3.67 work / partition
My questions are :
- Is there a specific name for this kind of partitioning problem ?
- Is there any mathematical model / algorithm to handle it ?
- For calculating the partition capacity, is average a good approach ?
Actually, I'm dealing with databases and I need to handle data coming from different tables, probably with billions of rows.
Please edit my post and tags if you find any possible enhancement. Thanks !
First of all, I agree that linear programming is an appropriate tool for this problem. I also want to add that if all tasks take the same amount of time, you can get an approximately optimal solution without much computational work. In fact, your approach of taking averages will work well.
You have $x$ job types, and a list $\vec{w} = [w_1, \ldots, w_x]$ indicating how many tasks $w_i$ there are to do for the job of type $i$.
If, for a specific job type, there are $w$ tasks to be done, and each task takes $t$ time for a person to complete, and there are $p$ persons assigned to work on the job type, then overall the job type will be completed in time:
$$\frac{w\times t}{p}$$
(Here I am assuming that tasks are "collaborative" in a way, meaning that three people can work together to complete a single task in a third the time. If the tasks are noncollaborative, it may be that when you have fewer tasks than people, some of the people will have nothing to do.)
If all tasks take the same amount of time to complete, then the solution is simple: roughly, partition the $P$ people in proportion to how many tasks there are to do. So, let $W_{\text{total}}$ be the total number of tasks in all work types. To each work type $x$, assign approximately the following number of people:
$$p_x = \frac{P\times w_x}{W_{\text{total}}} = P\times \frac{w_x}{W_{\text{total}}}$$ (This is your approach of taking averages.)
Naturally, you may get fractions --- but as a rough approximation, this will be approximately correct. As a further tweak, I would make sure that each work type has at least one worker (so that jobs will finish eventually), then round the rest of the assignments accordingly.