Finding the top sums of values from multiple lists

712 Views Asked by At

There are multiple lists given. The number of lists is arbitrary. Each list contains numbers and is sorted descendingly. We shall take exactly $1$ element from each list and calculate the sum of elements. How would you go about finding the top $N$ sums?

Example:

L1 = [5,3,3,1]
L2 = [8,7,0,0]
L3 = [3,1,1,1]

Top 1: 5+8+3=16
Top 2: 5+7+3=15
etc.

The naive solution is really slow (finding all possible sums and then sorting them).


Similar problem here: Subset sum with multiple lists

2

There are 2 best solutions below

11
On BEST ANSWER

First of all, the important thing is that you need to consider at each iteration new cases where just one step to the right in one of the lists is made, e.g. starting from the solution $5+8+3$ you obtain $3+8+3\ ,\ 5+7+3\ ,\ 5+8+1$ . Never more than $1$ step and never in more than $1$ list otherwise you will get a lower sum.

So the problem is to deal with the "rollbacks" (steps to the left) in all the list except the one you go one step to the right. The thing so is to create an algorithm where you always compare all the possible rollbacks at each new choice of the current top sum.

You can do this incrementing by one each of the indices of the current top sum and adding this possible new top sums to the comparison.

I clarify with your example :

L1 = [5,3,3,1]
L2 = [8,7,0,0]
L3 = [3,1,1,1]


Start with indices 0,0,0 and best is 5+8+3 and so do not use this in the comparison, 
use just the 3 incremented sums

                                      (5+8+3,[0,0,0])

                                  /          |          \
                   (3+8+3,[1,0,0])    (5+7+3,[0,1,0])    (5+8+1,[0,0,1])

The current best now is 5+7+3 with indices 0,1,0 and do not consider that in the 
next comparison, just use the other 5 sums

                    (3+8+3,[1,0,0])      (5+7+3,[0,1,0])     (5+8+1,[0,0,1])

                                      /         |        \
                       (5+0+3,[0,2,0])   (3+7+3,[1,1,0])  (5+0+3,[0,1,1])  

   
The best is now 3+8+3 with indices 1,0,0 so we expand it ([0,0,1] has the same sum 
so at the next iteration this sum will be taken)

As you see all the sums which compete to be the current best sum are taken into account at every new choice.

You can use a heap queue to implement the idea, remember to use a set where you add the new indices to compare so that to avoid duplicates in the queue. With $k$ lists, the time complexity is $O(Nk\cdot\max(\log k,\log N))$.


This is a Python implementation :

import heapq

def top_solutions(N,lists):

    N_best = []
    k, len_k_lists = len(lists), [len(x) for x in lists]
    init_sol = (-sum(x[0] for x in lists),tuple(0 for x in range(k)))
    heap_list = [init_sol]
    seen = {init_sol[1]}
    
    for _ in range(N) :
        
        curr_best = heapq.heappop(heap_list)
        N_best.append(curr_best)

        ind = curr_best[1]

        for x in range(k) :

            if len_k_lists[x] > ind[x]+1 : r = tuple(c if i != x else c+1 for i,c in enumerate(ind))
            else : continue

            if r not in seen :
                curr_sum = curr_best[0]-lists[x][r[x]]+lists[x][r[x]-1]
                curr_value = (curr_sum,r)
                heapq.heappush(heap_list,curr_value)
                seen.add(r)                

    return [(-x[0],x[1]) for x in N_best]
                                               
N = 5

lists = [ [5,3,3,1],
          [8,7,0,0],
          [3,1,1,1] ]

print(top_solutions(N,lists))
# output : [[16, [0, 0, 0]], [15, [0, 1, 0]], [14, [0, 0, 1]], [14, [0, 0, 2]], [14, [0, 0, 3]]]

Anyway you will obtain duplicate top sums, but with a different set of indices. If you are interested in just the values of the sums (e.g. you don't want two top sums equal to $14$ in the example ) then, first of all, you can eliminate the duplicate numbers in each list, and then check if you have $N$ different sums after $N$ iterations, if not you continue to run the procedure until you obtain them, I think there are also optimization to speed it up considerably. In this case, I think, the complexity is difficult to assess (anyway at worst roughly equal to the naive implementation, and for most inputs of $N$ less than it).

1
On

Given the three lists

L1 = [5,3,3,1]
L2 = [8,7,0,0]
L3 = [3,1,1,1]

we want to find the seventh highest summand that complies to your rules. I will show the way to do this. The design of an appropriate algorithm with appropriate data structures is up to you.

We create a table where each row represents a summand constructed from the list element in the predescribed way. We start with

0 0 0  5 8 3  16 

This is the highest possible summand, because it uses the highest values of each list. The first three numbers, 0 0 0, are the indexes of the summands in their list, the second three numbers are the values of these summands and the seventh position is the sum. The igths positions is a mark that is set if a summand was already extended. At the moment the summand was not extende so the eights position is empty.

Now we extend the line by adding the possible candiated for the second highest values, these are the following summands

0 0 0  5 8 3  16 x
1 0 0
0 1 0
0 0 1

Now we complete and reorder the lines by the size of the sum. When we actually implement the algorithm we do not reorder the rows but we use an appropriate data structure to hold the rows.

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15  
1 0 0  3 8 3  14
0 0 1  5 8 1  14 

Now we extend the unmarked row with the highest sum and mark it.

0 0 0  5 8 3  16 01 x
0 1 0  5 7 3  15 02 x
1 0 0  3 8 3  14 03
0 0 1  5 8 1  14 04
1 1 0  
0 2 0  
0 1 1  

We complete the rows and reorder it

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14
0 0 1  5 8 1  14
1 1 0  3 7 3  13
0 1 1  5 7 1  13 
0 2 0  5 0 3   8

We extend the unextended rows with the highest sum. So we extend two rows, because both have the same highest sum 14.

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
1 1 0  3 7 3  13
0 1 1  5 7 1  13 
0 2 0  5 0 3   8
2 0 0
1 1 0 !
0 1 1 !
1 0 1
0 1 1 !
0 0 2

We remove summands that were already generated. They are marked by a ! and complete the rows.

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
2 0 0  3 8 3  14
0 0 2  5 8 1  14
1 1 0  3 7 3  13
0 1 1  5 7 1  13 
1 0 1  3 8 1  12
0 2 0  5 0 3   8

Now remove all but the first seven rows.

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
2 0 0  3 8 3  14
0 0 2  5 8 1  14
1 1 0  3 7 3  13

And now we extend the unextended rows with the highest sum. These are again two rows with sum 14.

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
2 0 0  3 8 3  14 x
0 0 2  5 8 1  14 x 
1 1 0  3 7 3  13
3 0 0
2 1 0
2 0 1
1 0 2
0 1 2
0 0 3 

We complete the rows

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
2 0 0  3 8 3  14 x
0 0 2  5 8 1  14 x 
0 0 3  5 8 1  14 
1 1 0  3 7 3  13
2 1 0  3 7 3  13 
0 1 2  5 7 1  13 
3 0 0  1 8 3  12 
2 0 1  3 8 1  12 
1 0 2  3 8 1  12  

and remove all but the first seven

0 0 0  5 8 3  16 x
0 1 0  5 7 3  15 x
1 0 0  3 8 3  14 x
0 0 1  5 8 1  14 x
2 0 0  3 8 3  14 x
0 0 2  5 8 1  14 x 
0 0 3  5 8 1  14 

In the next step we will mark the last line and stop, because the table will not change. The seventh highest sum is 14 and there are different triples of summands that sum up to 14.


# a piece of pseudocode that describes the algorithm

# a tuple is an object that contains all information needed:
#     the index, the position in a heap if it is member of a heap
#     one can get the sum associated to this tuple by tuple.sum()
#     the method "successors" generates all k immediate successors of a tuple
#     for the tuple with index (i1,i2,i3,...) these are the tuples with index 
#         (i1+1, i2, i3, ...)
#         (i1, i2+1, i3, ...) 
#         (i1, i2, i3+1, ...) 
#         ...
  
# object largest_elements: a list of tuples
object elements_kept: a set to store tuples
object maxheap: a heap where pop retrieves and removes 
    the tuple with largest tuple.sum of the heap, 
    remove deletes an element of the heap,
object minheap: a heap where pop retrieves and removes 
    the tuple with smallest tuple.sum of the heap, 
    remove deletes an element of the heap 
N: we want to find the N_th largest tuple


# initialisation
tuple=Tuple(list of list of summands)

# this creates the first tuple with index (0,0,0)

maxheap.insert(tuple)
minheap.insert(tuple)
elements_kept.add(tuple)

# loop
while not maxheap.is_empty and largest_elements.count<N:
  tuple=maxheap.pop() # pop the tuple with largest tuple.sum() 
  minheap.remove(tuple) # pop the tuple with smallest tuple.sum() 
  largest_elements.append(tuple)
  heaplength=heaplength-1
  
  # loop invariants:
  # number of tuples of largest_elements +heaplength == N
  # tupels of maxheap == tuples of minheap
  # number of tuples of maxheap <= heaplength
  # tuples of elements_kept = tuples of  maxheap union tuples of largest_elements 
  # the intersection of tuples of maxheap and tuples of largest_elements is empty
  
  
  
  # if the set of successors contain a new element then remember this. 
  for t in tuple.successors():
    # same loop invariants as above 
    if t not in elements_kept:
      elements_kept.add(t)
      maxheap.insert(t)
      minheap.insert(t)
      # If we now remember more than heaplength elements forget the smallest
      while elements_kept.count()>heaplength:
        m=minheap.pop()
        maxheap.remove(m)
        elements_kept.remove(m)

# if we have finished the loop either we have found the largest N tuples
# or we cannot generate N tuples from the given lists

if largest_elements.count()==N:
  print("N-th largest element is", largest_elements.last())
else:
  print ("we cannot generate N elements from the given lists")

Here is a dynamic programming approach

N=10  # the Nth sum should be found
listOfListOfNumbers = [[5,3,3,1], [8,7,0,0], [3,1,1,1]]

new_sums[0] = ()
for listOfNumbers in listOfListOfNumbers:
  item_index=-1
  sums=new_sums  # copies new_sums to sums
  new_sums.empty() # creates an empty object
  for number in listOfNumbers:
    item_index=item_index+1
    for (sum, index) in "all key value pairs of sums":
      new_sums[sum+number]=index.append(item_index) 
        # append appends another coordinate to the index
        # e.g. (2,1,0).append(4) is (2,1,0,4)

for (sum, index) in "all key value pairs of new_sums ordered by sum":      
   cnt=cnt+1
   if N==cnt:
      return (sum, index)

 #  I nothing was retunred we did not find N different sums.