How to group combinations minimizing the number of elements in groups

208 Views Asked by At

In a software project I'm currently working on, we need to process every possible pair combinations of a given set $S$ of $k$ elements, where $S = \{0,1,\dots,k-1\}$.

Let $k \in \mathbb{N}$. The number of pairs will always be $p = {{k}\choose{2}} = \frac{k\cdot(k - 1)}{2}$.

As $k$ is quite high, and it is computationally expensive to process a pair, we have then decided to compute these pairs in parallel, more specifically in a cluster. We only have $n$ computers in this cluster though, so that each computer must process a group of pairs rather than just one.

Let $n \in \mathbb{N}$, $n < p$.

In order to a computer $n_0$ process a pair, we have to send it all elements in its group. And although I'm representing these elements as numbers, in reality, they are big objects. It is relevant to note that an element is transfered at most once to a computer, even if it is part of more than one pair in its group.

Now, my problem is grouping these pairs in such a way that:

  • The number of elements transfered to each computer is the minimum.
  • The number of pairs processed by each computer is the same, when possible.

Does anyone know a way to do this efficiently?

1

There are 1 best solutions below

3
On

A 2-design (BIBD) should meet your requirements. Let the points of the design be the set $S$ elements you wish to process, and let the blocks of the design be the computers. A point is in a block if and only if the associated object is transferred to the computer. Each computer then processes all of the pairs of objects it receives, and there is no risk of duplication due to the structure of the two design.

As an example, using the Fano plane with points $0 \ldots 7$ and blocks:

  • 3,4,6
  • 2,3,5
  • 1,5,6
  • 0,2,6
  • 0,1,3
  • 0,4,5
  • 1,2,4

In this example, each of 7 computers gets three objects and processes three pairs.

There exist designs for a variety of parameters. For relatively small numbers, I recommend the Handbook of Combinatorial Designs (Colbourn and Dinitz, ed.), which lists explicit examples of designs.

UPDATE: (responding to comment) It depends on the parameters. In design theory terms, you will know the values of $v$, the number of points (test cases); $b$, the number of blocks (computers); and $\lambda$, the replication number (1, since you only need to do each calculation once). Given $v$, you must pick $k$, the number of test cases given to each computer (which you will want to minimize) and $r$, the number of computers that get each test case to satisfy the relationships

  • $v-1 = r(k-1)$, and
  • $bk=vr$

The field of design theory is very rich and has a variety of constructions, but I don't think there is a general "pick v and b, and here's your design" construction. The Handbook of Combinatorial Designs I referenced earlier has the best list of known designs of relatively small order that I know of. If you provide specific parameters, I might be able to help more.