Unique combinations of datapoints into two bins?

296 Views Asked by At

I have a set of data of size $X$, say $X = 7$. I want to find all of the unique ways that the data can be grouped into two bins of a minimum size of two. For the example where $X = 7$, I have:

 Bin A: 5     4
 Bin B: 2     3

As the possible arrangements. What I need is what the actual arrangements are. So if the data for $X = 7$ up there is $A B C D E F G$ for the first column there then I need:

Combination 1: Bin A: A B C D E
Bin B: F G

Combination 2: Bin A: A B C D G
Bin B: F E

Combination 3: Bin A: A B E D G
Bin B: F C

etc.

until I have all possible combinations for the data. The restictions I have are that a bin has to have a minimum size of 2 and the bins are non-unique, so {A B} is the same as {B A}.

2

There are 2 best solutions below

5
On BEST ANSWER

What we need is to split set $A$ with $X$ elements into two subsets $C$ and $D$ with at least two elements each.

To split this into two sets of size $x$ and $X-x$ we can select randomly $x$ elements. We can do that in $\binom{X}{x}$ ways. Notice, that selecting $X-x$ elements out of $X$ would give the same result. Thus we need to do groupings for $x\leq \left\lfloor\frac{X}{2}\right\rfloor$

Finally the number of splitting elements into two bins with alt least two elements each is equal to

$$\sum_{x=2}^{\left\lfloor\frac{X}{2}\right\rfloor}\binom{X}{x}$$

For example for $X=7$ we have $$\sum_{x=2}^{3}\binom{7}{x} = \binom{7}{2}+\binom{7}{3}=21+35=56 $$

The algorithm could be represented simply by following python code:

import math

X=7
possibleLengths=range(2, int(math.floor(X/2))+1)

def GetAllPossibleSubsets(prefix, minimumValue, numberOfElements):
  if numberOfElements == 0:
    print prefix
    return
  if X-minimumValue < numberOfElements:
    return 
  for x in range(minimumValue, X):
    GetAllPossibleSubsets(prefix+[x], x+1, numberOfElements-1)

for x in possibleLengths:
  GetAllPossibleSubsets([], 0, x)
8
On

Assume those $n$ objects are all distinct, then the function you need to generate all combinations of $C(n,r)$ is called nextCombination$^{\dagger}$.

First, label each of your $n$ objects with a number in $\{1,2,3\dots,n\}$ so there is a one-to-one correspondence between the numbers and your objects. With some rules, you can generate all distinct $C(n,r)$ combinations:

(1) Since you want to pick $r$ out of $n$ objects, the starting sequence is $123\textrm{...r}$, and you also need the sequence $123\textrm{...n}$.

(2) Compare their last digit: the last digit of your $123\textrm{...r}$ and the last digit of a the sequence $123\textrm{...n}$.

(3) If they are the same, go on to compare their one-before-the-last digit and so on. If they're not, increase this digit in your $123\textrm{...r}$ by one, truncate those following digits, and append the same number of digits truncated to the digit you just increased, each of such digit is one more than the digit at its left hand side. (See the example because it's hard for me to describe it well.)

As you can verify, the sequence $143$ won't appear the list. So you won't re-count the combination represented by $134$.


Example:

Consider the case $C(5,3)$:

$ n:12345\\ r:123$

That $5\not=3$, so the next one is $12(3+1)=124$.

When $r$ is $145$, then $5=5,4=4,3\not=1$, so $145\to 1\_\ \_\to 2\_\ \_\to 2\ 3\ \_\to 2\ 3\ 4\to 234$.

And actually you can start from any sequence $a_1a_2a_3$ s.t. $a_1<a_2<a_3$. So $245\to345$, and this is the end of $C(5,3)$!


$^{\dagger}$if you want to dive into the details of code, see my little program in C++, the function has only fifteen lines. But in general don't care which language I used, the logic is the same, it's called algorithm.