Subset closest to X (knapsack problem)

370 Views Asked by At

Given a list of numbers e.g. [1,2,3,4,5,6] and a positive integer X, how can I find out what numbers from my list are required to get X and if not get X then the closest to X? This is for a python function, but I understand it is a mathematical issue.

if A is [12, 79, 99, 91, 81, 47] and s is 150, it will return [12, 91, 47] as 12+91+47 is 150

if A is [15, 79, 99, 6, 69, 82, 32] and s is 150 it will return [69, 82] as 69+82 is 151, and there is no subset of A whose sum is 150.

>>> def closest_sum(A, X):
...     closest = {}
...     closest_distance = None
...     for S in A:
...         if closest_distance is None or abs(sum(S) - X) < closest_distance:
...             closest = sum(S)
...             closest_distance = abs(sum(S) - X)
...     return closest
...
>>> closest_sum([{1, 3}, {4, 5, 6}], 5)
4
2

There are 2 best solutions below

4
On

This is the subset sum problem, which is known to be NP-complete (loosely meaning that if a polynomial-time algorithm exists to solve this problem, then many other hard problems could be solved in polynomial time).

Since this problem is NP-complete, there is essentially no faster solution than to iterate over all subsets of $A$ and check their distance from $X$. The pseudocode would look like this:

function closest_sum(A, X):
    closest = {};
    closest_distance = infinity;
    for every subset S of A:
        sum = sum of elements in S;
        if abs(sum - x) < closest_distance:
            closest = S;
            closest_distance = abs(sum - x);
    return closest

This takes exponential time of course, since there are $2^{|A|}$ subsets $S\subseteq A$.

2
On

You can solve the problem via mixed integer linear programming as follows. Let binary decision variable $y_i$ indicate whether item $i$ is selected. The problem is to minimize $$\left|\sum_i i y_i - X\right|,$$ which you can linearize by introducing nonnegative variables $s$ and $t$, and minimizing $s+t$ subject to $$\sum_i i y_i - X = s - t.$$ This approach is generally much faster than brute force.