Best N different sized sets split into M sets

36 Views Asked by At

I'm facing di hard (to me) combinatorial problem to solve. I start from N sets with different size, so let's imagine to 13 sets made by 2, 5, 300, 6, 7, 500, 1000, 234, 313, 200, 100, 37, 48 elements each, for instance. Now I need to take these sets and group them into M different sized groups, with the property each group has the "best fit size" nearest to the average of them over M groups. In words, I need to "equidistribute" the N sets in order to have the best "balanced" distribution of them into M groups. In the above case, to clarify, let's take M to be 4, so I need to land to 4 sets of 3 subtsets each. Their sum is: 2752, which is averaged over 4 groups as: 688

So I'd need to group them 4 by 4 in order their sum is nearest to 613 on each group, both below or above.

So here 1000 is highly exceeding the average so it will make a group as whole, i.e. {1000} Then we can have {500, 100} Then we can have {234, 313, 48} Then {37, 300, 200, 2, 5, 6, 7}

Of course the above distribution is by far to be optimal. What I'd need is to find an algo which minimizes the differences by each group sum and the average, as a kind of nesting.

It seems to me this could be related to the pigeon-hole principle, as a guess, but I don't know how to apply it.

I know the problem is a bit tricky to be explained, sorry my English in case.

Thank in advance for your help.

2

There are 2 best solutions below

3
On BEST ANSWER

You can solve the problem via integer linear programming as follows. Let $a_s$ be the size of set $s$, and let $t=\frac{1}{M} \sum_{s=1}^N a_s$ be the target group sum. Let binary decision variable $x_{s,g}$ indicate whether set $s$ is assigned to group $g$. The problem is to minimize $$\sum_{g=1}^M \left|\sum_{s=1}^N a_s x_{s,g} - t\right| \tag1$$ subject to $$\sum_{g=1}^M x_{s,g} = 1 \quad\text{for $s\in\{1,\dots,N\}$} \tag2 $$ The objective $(1)$ minimizes the total absolute deviation from the target. Constraint $(2)$ assigns each set to exactly one group.

You can linearize the absolute value by introducing nonnegative variables $y_g^+$ and $y_g^-$ and linear constraints $$\sum_{s=1}^N a_s x_{s,g} - t = y_g^+ - y_g^- \quad \text{for $g\in\{1,\dots,M\}$} \tag3$$

The linearized problem is to minimize $$\sum_{g=1}^M (y_g^+ + y_g^-)$$ subject to $(2)$ and $(3)$.

For your sample data, an optimal solution is \begin{align} x_{7,1}&=1\\ x_{6,2}&=1\\ x_{3,3}&=x_{10,3}=x_{12,3}=x_{13,3}=1\\ x_{1,4}&=x_{2,4}=x_{4,4}=x_{5,4}=x_{8,4}=x_{9,4}=x_{11,4}=1, \end{align} with group sizes $1000,500,585,667$ and objective value $$|1000-688|+|500-688|+|585-688|+|667-688| = 624.$$

If you prefer to instead minimize the difference between largest and smallest groups, you can similarly linearize to obtain group sizes $1000, 600, 576, 576$, for an optimal range of $1000-576=424$.

0
On

Rather than think of it as sets, think of it as objects $i\in I=\{1,2,\cdots,n\}$ with weights $w_i.$ Then you want partition $I=I_1\cup \cdots \cup I_m$ so that each $S_j=\sum_{i\in I_j} w_i$ is in some sense close to $$\frac{1}{m}\sum_{i=1}^n w_i.$$

There are a lot of ways to measure “in some way close.” So you are going to need a clarification there.

One measure would be to maximize $W_{\min}=\min_j S_j.$ Or to minimize $W_{\max}=\max_jS_j.$ Or to minimize $W_{\max}-W_{\min}.$

The ideal is, of course, $W_{\max}=W_{\min},$ but that won’t always be possible.

This smells like the knapsack problem, but with some key differences. The knapsack problem has been studied a lot, but I’m not sure it directly applies to this problem.