Algorithm wanted: Enumerate all subsets of a set in order of increasing sums

9k Views Asked by At

I'm looking for an algorithm but I don't quite know how to implement it. More importantly, I don't know what to google for. Even worse, I'm not sure it can be done in polynomial time.

Given a set of numbers (say, {1, 4, 5, 9}), I want to enumerate all subsets of this set (its power set, really) in a certain order: increasing sum of the elements.

For example, given {1, 4, 5, 9}, the subsets should be enumerated in this order, "smaller" sets first:

{} = 0

{1} = 1

{4} = 4

{5} = 5

{1, 4} = 5

{1, 5} = 6

{9} = 9

{4, 5} = 9

{1, 9} = 10

{1, 4, 5} = 10

{4, 9} = 13

{5, 9} = 14

{1, 4, 9} = 14

{1, 5, 9} = 15

{4, 5, 9} = 18

{1, 4, 5, 9} = 19

This feels like some unholy mix between a breadth-first search and a depth-first search, but I can't wrap my head around the proper way to mix these two search strategies.

My search space is very large ($2^{64}$ elements) so I can't precompute them all up-front and sort them. On that note, I also don't need to enumerate the entire search space -- the smallest 4,096 subsets is fine, for example.

Can anyone give any pointers or even any clues to google for? Many thanks.

3

There are 3 best solutions below

5
On BEST ANSWER

Here's an algorithm. The basic idea is that each number in the original set iterates through the list of subsets you've already found, trying to see if adding that number to the subset it's currently considering results in the smallest subset sum not yet found.

The algorithm uses four arrays (all of which are indexed starting with $0$).

  1. $N$ consists of the numbers in the original set; i.e., $N = [1, 4, 5, 9]$ in your example.
  2. $L$ is the list of subsets found so far.
  3. $A[i]$ contains the subset that $N[i]$ is currently considering.
  4. $S[i]$ is the sum of the elements of subset $i$ in $L$.

Algorithm:

  1. Initialize $N$ to numbers in the original set, all entries of $A$ to $0$, $L[0] = \{\}$, $S[0] = 0$. Let $j = 1$.
  2. For iteration $j$ find the minimum of $S[A[i]] + N[i]$ over all numbers $N[i]$ in the original set. (This finds the subset with smallest sum not yet in $L$.) Tie-breaking is done by number of elements in the set. Let $i^*$ denote the argmin.
  3. Let $L[j] = L[A[i^*]] \cup \{N[i^*]\}$. Let $S[j] = S[A[i^*]] + N[i^*]$. (This updates $L$ and $S$ with the new subset.)
  4. Increase $A[i^*]$ to the next item in $L$ that has no number larger than $N[i^*]$. If there is none, let $A[i^*] =$ NULL. (This finds the next subset in $L$ to consider for the number $N[i^*]$ just added to an existing subset in $L$ to create the subset just added to $L$.)
  5. If all entries in $A[i]$ are NULL, then stop, else increment $j$ and go to Step 2.


For example, here are the iterations for your example set, together with the subset in $L$ currently pointed to by each number.

Initialization:     
{}   1, 4, 5, 9

Iteration 1:    
{}   4, 5, 9
{1}  

Iteration 2:  
{}   5, 9
{1}  4 
{4}

Iteration 3:  
{}   9
{1}  4, 5
{4}
{5}

Iteration 4:  
{}   9
{1}  5
{4}
{5}
{1,4}

Iteration 5:  
{}   9
{1}  
{4}  5
{5}
{1,4}
{1,5}

Iteration 6:  
{}   
{1}  9
{4}  5
{5}
{1,4}
{1,5}
{9}

Iteration 7:  
{}   
{1}    9
{4}  
{5}
{1,4}  5
{1,5}
{9}
{4,5}

Iteration 8:  
{}   
{1}    
{4}    9
{5}
{1,4}  5
{1,5}
{9}
{4,5}
{1,9}

Iteration 9:  
{}   
{1}    
{4}    9
{5}
{1,4}  
{1,5}
{9}
{4,5}
{1,9}
{1,4,5}

And the rest of the iterations just involve adding $9$ successively to each subset already constructed that doesn't include $9$.

1
On

(Too long for a comment.)

Nijenhuis and Wilf's Combinatorial Algorithms details an algorithm for enumerating all the subsets of a given set, based on the Gray code (there is also a modification that makes it produce the subsets lexicographically). The book can be downloaded for free from the link I gave. You can download the FORTRAN routines NEXSUB() and LEXSUB() from Skiena's webpage. Some modification would be needed to have it output results in your desired order.

6
On

A simpler Python implementation (based on my previous "Ugly Numbers" solutions but I guess influenced by skimming the answers here). I build the subsets in order of increasing sum and keep a list of subsets built so far. For each number of the whole set, I have a generator that tries to add the number to the subsets built so far. Then I just merge these generators and extend the list with their results.

from heapq import merge

def subsets(wholeset):
    yield set()
    subsets = [set()]
    def add(number):
        for subset in subsets:
            if not subset or number > max(subset):
                yield subset | {number}
    for subset in merge(*map(add, wholeset), key=sum):
        yield subset
        subsets.append(subset)

Demo (Try it online!):

>>> print(*subsets({1, 4, 5, 9}))
set() {1} {4} {1, 4} {5} {1, 5} {4, 5} {9} {1, 4, 5} {1, 9} {9, 4} {1, 4, 9} {9, 5} {1, 5, 9} {9, 4, 5} {1, 4, 5, 9}

Benchmarks with coneyhelixlake's solution, the first case is the case from the question:

First 4096 subsets from random.sample(range(10**6), 64)
  11.9 ms ±  0.8 ms  subsets
 126.7 ms ±  3.4 ms  get_next_combs

First 100000 subsets from random.sample(range(10**6), 64)
 472.2 ms ±  8.4 ms  subsets
2751.2 ms ± 26.1 ms  get_next_combs

First 32768 subsets from random.sample(range(10**6), 15)
 144.4 ms ± 12.9 ms  subsets
 493.0 ms ± 16.4 ms  get_next_combs

Benchmark code (Try it online!:

def subsets(wholeset):
    yield set()
    subsets = [set()]
    def add(number):
        for subset in subsets:
            if not subset or number > max(subset):
                yield subset | {number}
    for subset in merge(*map(add, wholeset), key=sum):
        yield subset
        subsets.append(subset)

def get_next_combs(N, n=4):
  yield set()
  A = [0]*len(N)
  L = [set()]
  S = [0]
  j = 1
  while any(elem is not None for elem in A):
    i_star = np.argmin([S[A[i]] + N[i] if A[i] is not None else np.inf for i in range(0, len(N))])
    comb = L[A[i_star]].union((N[i_star],))
    L.append(comb)
    yield comb
    S.append(S[A[i_star]] + N[i_star])
    i = A[i_star]
    A[i_star] = None
    for i_next in range(i+1, len(L)):
      if N[i_star] > max(L[i_next]):
        A[i_star] = i_next
        break

solutions = [subsets, get_next_combs]

from timeit import default_timer as time
import random
from itertools import islice
import numpy as np
from heapq import merge
from statistics import mean, stdev

def consume(iterator, n):
    next(islice(iterator, n, n), None)

def test(wholeset, first_n):
    print('First', first_n, 'subsets from', wholeset)
    wholeset = eval(wholeset)

    # Correctness
    expect = list(islice(solutions[0](wholeset), first_n))
    for solution in solutions[1:]:
        result = list(islice(solution(wholeset), first_n))
        assert result == expect

    # Statistics initialization
    times = {solution: [] for solution in solutions}
    def stats(solution):
        ts = sorted(times[solution])[:5]
        return '%6.1f ms ± %4.1f ms ' % (mean(ts) * 1e3, stdev(ts) * 1e3)

    # Measure times
    for _ in range(10):
        random.shuffle(solutions)
        for solution in solutions:
            start = time()
            consume(solution(wholeset), first_n)
            end = time()
            times[solution].append(end - start)

    # Report statistics
    for solution in sorted(solutions, key=stats):
        print(stats(solution), solution.__name__)
    print()

test('random.sample(range(10**6), 64)', 4096)
test('random.sample(range(10**6), 64)', 100000)
test('random.sample(range(10**6), 15)', 2 ** 15)