An algorithm to find X numbers that sum up to a given value

3.5k Views Asked by At

I have this little problem and I was wondering if some mathematician here knew something useful about how to solve this or even how to approach this right.

In the simplest terms I have a set of integers $\mathbf{S}$ and I have a value, let's say another integer $\mathbf{k}$, at any given time I would like to identify a subset of $\mathbf{S}$, let's say $\mathbf{s}$ such as

$$\forall\ \mathbf{x_n} \in\ \mathbf{s},\ \mathbf{x_1}+\mathbf{x_2} + \dots \approx \mathbf{k}$$

where

$$\mathbf{n} \geq 1$$

In other words I need to locate a subset where the sum of its own elements are equal or near the value of my $\mathbf{k}$ and the number of those elements can be anything from just 1 to a small arbitrary integer ( I don't need too many elements, better if they are in a small subset ).

I would like to have control not over $\mathbf{n}$ directly, but over the range where $\mathbf{n}$ will be.

So in this algorithm my inputs would be:

  • the set $\mathbf{S}$
  • my $\mathbf{k}$
  • the range for $\mathbf{n}$

A final note is about the fact that I would like to have a non deterministic algorithm, the value of the sum and $\mathbf{k}$ can be the same, but I would prefer if they will be near .

1

There are 1 best solutions below

1
On

I take the request for "a nondeterministic algorithm" to mean an algorithm that can return all possible solutions (or at least more than one) when several exist. Allowance for an inexact summation makes it more likely that multiple solutions do exist, nearly to the point of being unable to say whether or not a subset of size $n$ of $S$ is or is not a solution.

First consider the "intelligent" brute-force approach of generating all such subsets of size $n$, computing their sum, and comparing with the target $k$. If $|S|=m \ge n$, there will be $\binom{m}{n}$ subsets to generate and test, possibly returning those which achieve best possible approximation of $k$ or those which come within some (not yet specified) tolerance of $k$.

For fixed $n$ this requires computational effort proportional to $n \binom{m}{n} $. We can do this beginning with $n=1$ and continue (if necessary) until the top of the range allowed for $n$ is reached.

Since the emphasis is on solutions where $n$ is "a small arbitrary integer", we should point out efficiencies that are available when $n=1,2$. If $S$ is sorted into an array, then binary search can be applied for $n=1$ to find the greatest lower bound and least upper bound for $k$ in $S$, one or both of which give the best approximation to $k$ (though perhaps only one with exist) with complexity $O(\log(m))$. [NB: The cost of sorting $S$ is $O(m \log(m))$ if we must include that effort.]

The case $n=2$ can be considered a special case of an often-used interview question. [See also this Answer on a similar Question.] Again assume the set $S$ is sorted into an array. Starting with the smallest and largest elements of $S$, we decrease the upper element to the next entry when the sum exceeds $k$ or increase the lower element to the next entry when the sum is below $k$. The ensuing sequence alternates between strings of upper and lower bounding sums, and if an exact sum $k=x_a + x_b$ exists, it will be found in $O(m)$ steps. This "Saddleback Search" procedure can be modified to find the greatest lower bound and least upper bound sums of two elements, if no exact sum $k$ exists, or to return all the possible pairs that give $k$ exactly (or which meet some tolerance) if more than one exists, retaining the same $O(m)$ complexity.

Note that this improves on the $O(m^2)$ complexity of the brute-force approach, and it is not hard to argue that $O(m)$ complexity (for best sum of pairs) is optimal.

This Saddleback Search method can be combined with brute-force generation of $(n-2)$-subsets of $S$ to knock off an exponent of $m$ in higher dimensions. I'll look for some references, because I suspect this may be the optimal complexity for $1 < n \ll m$.

A 1998 paper Fast Algorithms for Generating Integer Partitions includes a useful survey of methods and their performance known at the time. Your problem is specialized to restricted partitions having "parts in $S$", and also "multiplicity one" since summands cannot be repeated.