Intro
I need to design an algorithm that will distribute a known set of tasks with a known RAM requirement to a known set of machines with known RAM capacities. The tasks require a certain amount of RAM to execute and the machines have a certain amount of RAM to offer, hence, we can only assign tasks to machines that can satisfy their RAM requirement.
When you assign a task to a machine, you reserve a portion of its RAM for that task. So, for example, if a machine has 100 RAM available, it can run 10 tasks that each require 10 RAM, or one task that requires 50 RAM and another two that require 25 each.
We may also combine machines to get more RAM but we can't split a machine to get smaller machines with less RAM. The catch is that the more tasks you assign to a machine the longer it takes for the machine to finish executing all the tasks, in quadratic time. We have no way to move tasks between machines or run one task at a time on a machine. All tasks are assigned at the start to all the machines and start running at the same time.
I have tried researching whether this is a known problem and I have stumbled upon things like the bin packing problem and from there stumbled upon the bin covering problem, where "items of different sizes must be packed into a finite number of bins or containers, each of which must contain at least a certain given total size, in a way that maximizes the number of bins used". This seems to be closer to my case apart from the fact that my bins can be merged together.
Definition
Say we have a set $M$ of $n$ machines $m$, with each machine having a certain amount of RAM $r$: $M=\{r_1m_1, r_2m_2, ... , r_nm_n\}$
And we have a set $T$ of $k$ tasks $t$ that will all be assigned to these machines, with each task requiring an amount of RAM $x$ to run: $T = \{x_1t_1, x_2t_2, ... , x_kt_k\}$
The sum of all the RAM required by the tasks will never exceed the sum of all the RAM that the set of machines possesses, but can be equal or smaller: $\sum_{i=1}^nr_i \ge \sum_{i=1}^kx_i$
When a task is assigned to a machine, the machine will have $r-x$ RAM remaining for potential future assignments
If a single machine's RAM is not enough to satisfy the requirement of a certain task, then two machines can be combined to form a machine that has the same RAM as the sum of its component machines. If that's still not enough, an additional machine can be added with its RAM also added, and so on. The machines can be combined in any way, including combining all the machines into one which will have the same RAM as the sum of the RAMs of the entire set $M$.
Each task takes one unit of CPU time to complete assuming it's the only task that's been assigned to the machine. If there's two tasks assigned to one machine then they will take 4 units of CPU time to compelete, if there's three tasks assigned to a machine then they will take 9 units of CPU time to complete, and so on, with $n$ tasks assigned to one machine taking $n^2$ time to complete.
Constraints
- A machine $m$ itself cannot be split into submachines with smaller RAM. They can only combined and uncombined with each other.
- The tasks must be assigned at the start and cannot be shifted between machines after the initial assignment. If a task is assigned to a machine then it will have to be executed in parallel to any other task assigned to that machine.
- The task execution begins only after all the tasks are assigned, hence, we can't just assign one task to one machine and then wait for it to finish executing and then feed it the next task in order to sidestep the problem of parallel task executing requiring quadratic time.
Goals
Create a configuration of machines (by combining or not combining them) and assign all the tasks to those machines such that (in order of importance):
- Minimize the total amount of CPU time that is consumed. The fact that separate machines can run in parallel is irrelevant, as the algorithm strives to minimize the total CPU time necessary to complete all the tasks. You can think of it as that only one machine can be active at a time;
- The machines should be combined as little as possible.
The algorithms that I came up with
In order to better illustrate what needs to be achieved, let's use a concrete example: $$ M = \{92m_1, 56m_2, 15m_3\} \\ T = \{30t_1, 28t_2, 16t_3, 14t_4, 1t_5\} $$
1. (probably) the worst possible algorithm, combine all machines into one and assign all tasks to it:
We combine all machines into machine $m_1m_2m_3$ which has RAM = $92 + 56 + 15 = 163$ and assign all tasks to it. We get the following tree:
As you can see, we have 5 tasks running in parallel on the one big combined machine. Recall that the more tasks you're running in parallel the more CPU time you need, quadratically. $Time = 5^2 = 25$
Result: 25 units of CPU time
2. The best algorithm I could come up with, biggest-task-to-biggest-machine
Take both sets and sort them, biggest coefficient first: $$ M = \{92m_1, 56m_2, 15m_3\} \\ T = \{30t_1, 28t_2, 16t_3, 14t_4, 1t_5\} $$
Take the largest (first) element of $T$ and check if it's less than or equal to the largest (first) element of $M$. If yes, assign $t_1$ to $m_1$. If no, combine $m_1 + m_2$, and check again. And so on until you combined all the machines.
The machine's coefficient with an assigned task is also reduced by the amount of RAM the task requires. Once a task is assigned to a machine you add that machine-task tuple to a new set $M_n$ where the subscript $n$ indicates how many tasks are assigned to each machine in the set. The initial set of unassigned machines can be denoted as $M_0$. The insertion also happens into the index necessary for the set to remain sorted in descending order.
When checking for machines to satisfy a task's RAM, check first the largest $r$ value in the 0-assigned-tasks-to-the-machine set $M_0$, then try combining the largest+2nd largest $r$, then the 3 largest and so on, until you've tried all the possible combinatons from the $M_0$ set, which is all the possibilities that you end up with a machine or machine combination that will enter set $M_1$ after the $t$ assignment. Then, in a similar way, you try all the possibilities that will end up putting the machine(combo) into the $M_2$ set, and so on.
And so we have: $$ x_1=30; r_1=92; 30 \le 92 ? ✅ \implies t_1 \in m_1; r_1 = 92 - 30 = 62; \\ \underline{M_0 = \{56m_2, 15m_3\}, M_1 = \{62m_1[30t_1]\}} \\ x_2=28; r_2=56; 28 \le 56 ? ✅ \implies t_2 \in m_2; r_2 = 56 - 28 = 28; \\ \underline{M_0 = \{15m_3\}, M_1 = \{62m_1[30t_1], 28m_2[28t_2]\}} \\ x_3=16; r_3=15; 16 \le 15 ? ❌ r_3 + r_1 = 15 + 62 = 77; 16 \le 77 ? ✅ \implies t_3 \in m_1m_3; r_1r_3 = 77 - 16 = 61; \\ \underline{M_0 = \{\}, M_1=\{28m_2[28t_2]\}, M_2=\{61m_1m_3[30t_1,16t_3]\}} \\ x_4=14; 14 \le 28 ? ✅ \implies t_4 \in m_2; r_2 = 28 - 14 = 14; \\ \underline{M_1 = \{\}, M_2=\{61m_1m_3[30t_1,16t_3], 14m_2[28t_2,14t_4]\}} \\ x_5=1; 1 \le 61 ? ✅ \implies t_5 \in m_1m_3; r_1r_3=61-1=60; \\ \underline{M_2=\{14m_2[28t_2,14t_4]\}, M_3=\{60m_1m_3[30t_1,16t_3,1t_5]\}} $$
This results in the following forest (set of trees):

The tasks on $m_2$ will complete in $2^2=4$ CPU time, the tasks on $m_1m_3$ will complete in $3^2=9$ CPU time. So the total is: $Time=2^2+3^2=4+9=13$
Result: 13 units of CPU time (almost a 2x improvement over the worst algorithm!)
Anyone have a better algorithm?

You can solve the problem via integer linear programming as follows. Let $G=\{1,\dots,|M|\}$ be the set of groups of machines, with some groups possible empty. Let $K=\{0,\dots,|T|\}$ be the possible counts of tasks per group. Let $d_t$ be the demand of task $t$, and let $c_m$ be the capacity of machine $m$.
For $t\in T$ and $g\in G$, let binary decision variable $x_{tg}$ indicate whether task $t$ is assigned to group $g$. For $m\in M$ and $g\in G$, let binary decision variable $y_{mg}$ indicate whether machine $m$ is assigned to group $g$. For $g\in G$ and $k\in K$, let binary decision variable $z_{gk}$ indicate whether group $g$ contains exactly $k$ tasks. The problem is to minimize $$\sum_{g \in G} \sum_{k \in K} k^2 z_{gk}$$ subject to linear constraints \begin{align} \sum_{g \in G} x_{tg} &= 1 &&\text{for $t \in T$} \tag1\label1 \\ \sum_{g \in G} y_{mg} &= 1 &&\text{for $m \in M$} \tag2\label2 \\ \sum_{k \in K} z_{gk} &= 1 &&\text{for $g \in G$} \tag3\label3 \\ \sum_{t \in T} d_t x_{tg} &\le \sum_{m \in M} c_m y_{mg} &&\text{for $g \in G$} \tag4\label4 \\ \sum_{t \in T} x_{tg} &= \sum_{k \in K} k z_{gk} &&\text{for $g \in G$} \tag5\label5 \end{align}
Constraint \eqref{1} assigns each task to exactly one group. Constraint \eqref{2} assigns each machine to exactly one group. Constraint \eqref{3} assigns each group exactly one count. Constraint \eqref{4} ensures that each group has sufficient capacity. Constraint \eqref{5} enforces that the count assigned to each group is correct.
For your example, the problem is to minimize $$z_{1,1} + 4z_{1,2} + 9z_{1,3} + 16z_{1,4} + 25z_{1,5} + z_{2,1} + 4z_{2,2} + 9z_{2,3} + 16z_{2,4} + 25z_{2,5} + z_{3,1} + 4z_{3,2} + 9z_{3,3} + 16z_{3,4} + 25z_{3,5}$$ subject to \begin{align} x_{1,1} + x_{1,2} + x_{1,3} &= 1 \\ x_{2,1} + x_{2,2} + x_{2,3} &= 1 \\ x_{3,1} + x_{3,2} + x_{3,3} &= 1 \\ x_{4,1} + x_{4,2} + x_{4,3} &= 1 \\ x_{5,1} + x_{5,2} + x_{5,3} &= 1 \\ y_{1,1} + y_{1,2} + y_{1,3} &= 1 \\ y_{2,1} + y_{2,2} + y_{2,3} &= 1 \\ y_{3,1} + y_{3,2} + y_{3,3} &= 1 \\ z_{1,0} + z_{1,1} + z_{1,2} + z_{1,3} + z_{1,4} + z_{1,5} &= 1 \\ z_{2,0} + z_{2,1} + z_{2,2} + z_{2,3} + z_{2,4} + z_{2,5} &= 1 \\ z_{3,0} + z_{3,1} + z_{3,2} + z_{3,3} + z_{3,4} + z_{3,5} &= 1 \\ 30x_{1,1} + 28x_{2,1} + 16x_{3,1} + 14x_{4,1} + x_{5,1} &\le 92y_{1,1} + 56 y_{2,1} + 15y_{3,1} \\ 30x_{1,2} + 28x_{2,2} + 16x_{3,2} + 14x_{4,2} + x_{5,2} &\le 92y_{1,2} + 56 y_{2,2} + 15y_{3,2} \\ 30x_{1,3} + 28x_{2,3} + 16x_{3,3} + 14x_{4,3} + x_{5,3} &\le 92y_{1,3} + 56 y_{2,3} + 15y_{3,3} \\ x_{1,1} + x_{2,1} + x_{3,1} + x_{4,1} + x_{5,1} &= z_{1,1} + 2z_{1,2} + 3z_{1,3} + 4z_{1,4} + 5z_{1,5} \\ x_{1,2} + x_{2,2} + x_{3,2} + x_{4,2} + x_{5,2} &= z_{2,1} + 2z_{2,2} + 3z_{2,3} + 4z_{2,4} + 5z_{2,5} \\ x_{1,3} + x_{2,3} + x_{3,3} + x_{4,3} + x_{5,3} &= z_{3,1} + 2z_{3,2} + 3z_{3,3} + 4z_{3,4} + 5z_{3,5} \end{align}
To avoid adding too many machines to a group, you can relax \eqref{2} to $\le 1$ and impose the following additional (big-M) constraints: \begin{align} 1 + \sum_{m' \in M \setminus \{m\}} c_m y_{m'g} - \sum_{t \in T} d_t x_{tg} &\le \left(1+\sum_{m' \in M \setminus \{m\}} c_m-0\right)(1-y_{mg}) &&\text{for $m\in M$ and $g \in G$} \tag6\label6 \end{align} These constraints enforce the following logical implication: $$y_{mg} = 1 \implies \sum_{t \in T} d_t x_{tg} \ge 1 + \sum_{m' \in M \setminus \{m\}} c_m y_{m'g}$$ The idea is that if machine $m$ appears in group $g$ then the other machines $m'$ that appear in group $g$ do not have sufficient capacity without machine $m$.
For your small example, these expand as follows: \begin{align} 1 + 56y_{2,1} + 15y_{3,1} - 30x_{1,1} - 28x_{2,1} - 16x_{3,1} - 14x_{4,1} - x_{5,1} &\le 72(1-y_{1,1}) \\ 1 + 56y_{2,2} + 15y_{3,2} - 30x_{1,2} - 28x_{2,2} - 16x_{3,2} - 14x_{4,2} - x_{5,2} &\le 72(1-y_{1,2}) \\ 1 + 56y_{2,3} + 15y_{3,3} - 30x_{1,3} - 28x_{2,3} - 16x_{3,3} - 14x_{4,3} - x_{5,3} &\le 72(1-y_{1,3}) \\ 1 + 92y_{1,1} + 15y_{3,1} - 30x_{1,1} - 28x_{2,1} - 16x_{3,1} - 14x_{4,1} - x_{5,1} &\le 108(1-y_{2,1}) \\ 1 + 92y_{1,2} + 15y_{3,2} - 30x_{1,2} - 28x_{2,2} - 16x_{3,2} - 14x_{4,2} - x_{5,2} &\le 108(1-y_{2,2}) \\ 1 + 92y_{1,3} + 15y_{3,3} - 30x_{1,3} - 28x_{2,3} - 16x_{3,3} - 14x_{4,3} - x_{5,3} &\le 108(1-y_{2,3}) \\ 1 + 92y_{1,1} + 56y_{2,1} - 30x_{1,1} - 28x_{2,1} - 16x_{3,1} - 14x_{4,1} - x_{5,1} &\le 149(1-y_{3,1}) \\ 1 + 92y_{1,2} + 56y_{2,2} - 30x_{1,2} - 28x_{2,2} - 16x_{3,2} - 14x_{4,2} - x_{5,2} &\le 149(1-y_{3,2}) \\ 1 + 92y_{1,3} + 56y_{2,3} - 30x_{1,3} - 28x_{2,3} - 16x_{3,3} - 14x_{4,3} - x_{5,3} &\le 149(1-y_{3,3}) \end{align}