Is there an efficient algorithm to decide whether a particular subset of given integers exists?

327 Views Asked by At

let $ k+1 $ positive integers $ A>a_1 \ge a_2 \ge \ldots \ge a_k $ be given. A subset $ M \subseteq \lbrace 1,\ldots,k \rbrace $ is called good, if

(i) $ \sum_{m \in M} a_m \ge A $

(ii) Property (i) does not hold for any proper subset $ M' \subset M $.

Is there an efficient way (i.e., for instance of order $ k^d $ for some small $ d \in \mathbb{N} $) to decide whether there exists a good set $ M $ with $ |M| \ge 3 $ or not?

2

There are 2 best solutions below

7
On BEST ANSWER

Summary/outline

This is my second proposal. My initial answer was proven false, and I plan on removing it by next week. Because this is quite different, I decided to post a new answer instead of editing my previous one. If that is contrary to the network policy, I apologize in advance.

This is rather lengthy, so a tl;dr: I claim there's a way to solve the problem with a $k\log k$ test. [EDIT]: I now believe this can be done in $\mathcal O(k)$. Originally, the $k\log k$ cost came from a sort operation that can in fact be simplified. More details in the new section dedicated to the complexity, after I explain the principle of the method.

I introduce a characterization of the desired subset, and then explain the principle of the algorithm I derive from that characterization. Because that algorithm fails if we use its naive introduction, I then explain how to deal with the issue. Finally I provide some pseudo-code.


Necessary property

Claim: There exists a good subset $M$ with $\lvert M\rvert\ge 3$ if and only if there are two indices $i$ and $j$ such that:

  • $i<j<k$
  • $a_i+a_j <A$, and
  • $a_i+a_j+\sum_{l=j+1}^ka_l\ge A$

Proof: We proceed by double implication.

Let $M$ a good subset with cardinality at least 3. Let $i<j$ be its two minimum indices. By condition (ii), $a_i+a_j< A$. By condition (i) we have $a_i+a_j+\sum_{l=j+1}^ka_l \ge \sum_{l\in M}a_l \ge A$. Therefore the two minimum indices from $M$ satisfy the property.

Conversely, suppose there are two indices $i$ and $j$ that satisfy the property. Initialize $M=\{i,j\}$. Loop over index $l$, starting from the value $j+1$ up to $k$. Add index $l$ to $M$ if the sum over elements in $M$ is strictly smaller than $A$. If the sum is greater than or equal to $A$, we can stop. Because we add elements that are smaller and smaller, the resulting $M$ will be good. In other words when we finally reach a subset $M$ whose sum is greater than (or equal to) $A$, condition (ii) will also be fulfilled.


Principle of the method

This equivalence suggests that we can loop over possible indices $j$, that is $2\le j\le k-1$, and try to look for some index $i$ that fits the criterion. That is, given some index $j$, can we find an index $i$ such that:

  • $i<j$
  • $A-\sum_{l=j}^ka_l\le a_i< A-a_j$

Note that $i<j$ implies $a_i\ge a_j$, thus $a_i$ must also fulfill that inequality. Ideally, I would like to define $b^-_j=\max\{ a_j;\ A-\sum_{l=j}^ka_l\}$ and $b^+_j=A-a_j$, then say that it suffices to find some index $i$ such that $b^-_j\le a_i<b^+_j$. That however does not work because $a_i\ge a_j$ does not imply $i<j$. Indeed, some of the values $a_{i/j}$ can be repeated.

[EDIT]: As someone pointed out in the comments, this problem can also be expressed as finding $i$ with $b^-_j\le a_i<b^+_j$, but then you realize that $i=j$.

IF we actually had equivalence, we could achieve a test for a good subset $M$, $\lvert M\rvert\ge 3$ by using a sorted list. First we drop any interval $[b^-_j,b^+_j)$ that is empty, in other words if $b^+_j\le b^-_j$, we can forget about that particular $j$ candidate. Then we sort every (remaining) $b^-_j$, $b^+_j$ and $a_i$, with the convention that for equal values, $b^-_j$ is smaller than $a_i$, and $a_i$ is smaller than $b^+_j$. We could go over the sorted list and:

  • When we pop out some $b^-_j$, the corresponding interval is declared "open".
  • When we pop out some $b^+_j$, the corresponding interval is declared "closed".
  • When we pop out some $a_i$, two cases arise. If some interval is "open", then we have just found a good pair $i,j$. If no interval is "open", then we just continue with other values.

On the complexity

[EDIT]: I changed my complexity claim to linear (previously $k\log k$).

In general, sorting arbitrary lists is a $k\log k$ process, so the procedure above would have $k\log k$ complexity. However, here the lists are not arbitrary. We actually have to merge lists that are already sorted, and that can be done in linear time.

Indeed, the $a_i$ are decreasing, and thus $b^+_j=A-a_j$ is increasing. Because the $a_i$'s are positive integers, the sum $\sum_{l=j}^ka_l$ is strictly decreasing, and thus $A-\sum_{l=j}^ka_l$ is strictly increasing. It follows that $b^-_j$ is a maximum between two monotonous sequences and can also easily be merged.

These remarks also hold with the modified values $\hat a_i$, $\hat b_j^-$ and $\hat b^+_j$ that I introduce below.


Solving the issue of repeated values ($i\ge j$, $a_i\le a_j$).

For my proposed method to work, we need to obtain a way to have an equivalence between $i<j$ and $a_i>a_j$. Because we are working with integers, this can be achieved by defining new values $\hat a_i$ that are not necessarily integers.

If $a_i$ appears only once, we let $\hat a_i=a_i$.

If the value $a_i$ appears several times, say $n$ times at indices $i_1<\ldots<i_m<\ldots<i_n$. Then we let $\hat a_{i_m}=a_i+\frac{m-1}n$. We thus obtain non-integer values with $$a_i+1 > \hat a_{i_1} >\ldots> \hat a_{i_m} >\ldots> \hat a_{i_n}=a_i$$ It follows that $i<j$ is equivalent to $\hat a_i>\hat a_j$.

[EDIT]: Tl;dr small change to the definition of $\hat b^-_j$ so that my claim 2 can have a correct proof. That change requires a new notation $n_i$.

For an index $i$, let $n_i$ be the number of times the value $a_i$ appears in the original list (possibly $n_i=1$). Notice that $\frac 1{n_i}$ is the distance between consecutive values $\hat a_{i_m}$ and $\hat a_{i_{m+1}}$, when $n_i\ge 2$.

Then we define $\hat b^-_j=\max\{ \hat a_j+\frac 1{2n_j};\ A-\sum_{l=j}^ka_l\}$ and $\hat b^+_j=A-a_j$. Observe that regardless of the value of $n_j$, we have $$\hat a_{j-1} > \hat a_j+\frac 1{2n_j} > \hat a_{j} $$

[EDIT]: Claim 2 remains unchanged, but its proof has changed to reflect the change in definition.

Claim 2: We have $\hat b^-_j\le \hat a_i< \hat b_j^+$ if and only if

  • $i<j$, and
  • $A-\sum_{l=j}^ka_l\le a_i< A-a_j$

Proof: Assume first that $\hat b^-_j\le \hat a_i< \hat b^+_j$. By definition of the various terms we have $$ \hat a_i\ge\hat b^-_j\ge\hat a_j+\frac 1{2n_j}>\hat a_j $$ The part that actually interests us is $\hat a_i>\hat a_j$, since it implies $i<j$. We get our first condition. For the second condition, notice that because $A-\sum_{l=j}^ka_l$ is an integer we can use the floor function and preserve inequalities: \begin{align*} \hat b^-_j\le \hat a_i< \hat b^+_j &\implies A-\sum_{l=j}^ka_l\le \hat a_i< A-a_j\\ &\implies \left\lfloor A-\sum_{l=j}^ka_l\right\rfloor\le \lfloor\hat a_i\rfloor< A-a_j\\ &\implies A-\sum_{l=j}^ka_l\le a_i< A-a_j \end{align*} Thus, the two conditions are verified.

Conversely, suppose that $i<j$ and $A-\sum_{l=j}^ka_l\le a_i< A-a_j$. Because $$A-\sum_{l=j}^ka_l\le a_i\le \hat a_i< a_i+1\le A-a_j=\hat b^+_j$$ we have $$A-\sum_{l=j}^ka_l\le \hat a_i<\hat b^+_j$$ Because $i<j$ we know that $$ \hat a_i\ge \hat a_{j-1}>\hat a_j +\frac 1{2n_j} $$ Combined with the above, we deduce $\hat b^-_j\le \hat a_i$. The two conditions thus imply $\hat b^-_j\le \hat a_i<\hat b^+_j$, which concludes the proof.


Pseudo-code

In this pseudo-code, b+_j denotes $\hat b^+_j$, likewise b-_j is for $\hat b^-_j$. Also, a_i actually denotes $\hat a_i$. Computing the values of $\hat a_i$ can be done in linear time wrt $k$.

Compute the various b+_j and b-_j          //linear wrt k
Discard any pair b-_j/b+_j if b+_j <= b-j  //forget about intervals with no solution

Sort the b+_j, b-_j and a_i in a list L    //k log k

state = [0,...,0]                          //array of k values for interval status: 0=closed, 1=open
open_flag = 0                              //counter value indicating if some interval is open


for x in L:

    if x is an element of type b-_j:        //open interval j
        state[j] = 1
        open_flag += 1
    end if

    if x is an element of type b+_j:        //close interval j
        state[j] = 0
        open_flag -= 1
    end if

    if x is an element of type a_i:         //check if an interval is open
        if open_flag > 0:
            return True or find a good pair i,j using the "state" array
        end if
    end if

end for             //this loop was linear wrt the size of L, thus linear wrt k


return False        //if we reach this point, we never found a good pair

Note that if we do not care about finding a good subset, and only about its existence, we do not need to care about the state array.

9
On
  1. If a good subsequence of $a_1,\ldots,a_k$ exists then the sum of all $a_i$ must be greater or equal than $A.$ So in the following we assume that the sum of the sequence is greater or equal than $A.$
  2. If the sum of the sequence is greater or equal than $A$ then in linear time a good subsequence can be calculated. Simply scan through the elements of the sequence and remove an element from the sequence if the remaining elements sum up to a value greater or equal to $A$. If you have scanned through the whole sequence the remaining elements sum to a value greater or equal to $A$ but if you remove one of these elements the sum of the remaining will be less than $A.$ The sequence you find with this algorithm may have 1, 2, 3 or more elements. If you end with 1 or 2 elements you don't know if a good sequence exists with 3 or more elements exists.
  3. If the sum of each pair of elements (with different indexes) of a sequence is less than $A$ than the algorithm described at 2 will find a good sequence with 3 or more elements in linear time because it cannot end with a one or two elements.

  4. If $s_3$ is a good subsequence with three or more elements then it cannot have two elements that sum up to a value greater or equal than $A.$

  5. If $s_1$ is a subsequence of $s$ and $s_2$ is a subsequence of $s_1$ and therefore a subsequence of $s$ than $s_2$ is good with respect to $A$ and $s$ if and only if $s_2$ is good with respect to $s_1$ and $A.$

  6. Define $ss(s,i,A)$ as the subsequence of a sequence $s$ such that $a_k$ is in this subsequence either if $k=i$ or if $i \lt k$ and $a_i+a_k<A.$

  7. If $s_2$ is a good subsequence of a sequence $s$ with respect to $A$ and $a_i$ is its largest element then $s_2$ is a good subsequence of $ss(s,i,A)$ with respect to $A$. A good subsequence of $ss(s,i,A)$ can be found in linear time as described in 3. A good subsequence of $ss(s,i,A)$ with respect to $A$ must have three or more elements.

  8. for all indexes $i$ of $s$ check if there is a good subsequence in $ss(s,i,A)$. If not, then there is not such a subsequence. If yes, you have found one.

Example1

If $s$ is the sequence $8,8,7,6,1,1$ and $A=9$ this works in the following way: we investigate $$ss(s,1,9)=8$$ $$ss(s,3,9)=7,1,1$$ $$ss(s,4,9)=6,1,1$$ $$ss(s,5,9)=1,1$$

The first two have less than 3 elements and the third and the following one does not sum up to 9 or higher. So there is no good subsquence.

Example2

If $s$ is the sequence $8,8,7,6,2,2,1,1$ and $A=9$ this works in the following way: we investigate $$ss(s,1,9)=8$$ $$ss(s,3,9)=7,1,1$$ $$ss(s,4,9)=6,2,2,1,1$$ $$ss(s,5,9)=2,2,1,1$$ $$ss(s,7,9)=1,1$$

Only the third one is of interest. We scan through $6,2,2,1,1$ from the beginning. We cannot drop $6$ but the first $2$ and the first $1$ and end with the good subsequence $6,2,1$


From this we construct the following algorithm that needs $O(k)$ time.

Input

All numbers are integer. We have given $k$ such that $$k \ge 3 \tag{1}$$ a finite sorted sequence $a_1,\ldots, a_k$, so $$a_1\ge a_2\ge\ldots\ge a_k \gt 0 \tag{2}$$ and a number $A$ such that $$A \gt a_1$$

Decision Algorithm

Initialization

Initialize $u$

we set \begin{align} &u:=1\\ &\text{while }(a_u+a_{k-1}\ge A) \text{ and } (u\lt k-2) \\ &\qquad u:=u+1 \\ \end{align} If now $a_u+a_{k-1}<A$ then there exists no $v$ such that $a_u+a_v\ge A$ and therefore there exists no good subset with at least $3$ elements. The algorithm will terminate.

Otherwise it continues:

Initialize $v$

\begin{align} &\text{tailsum}:=a_{k-1}+a_k\\ &v:=k-1\\ &\text{while }(u \lt v-1) \text{ and }a_u+a_{v-1} \lt A\\ &\qquad v:=v-1\\ &\qquad \text{tailsum}:=\text{tailsum}+a_{v}\\ \end{align}

loop invariant
$$\text{tailsum}=\sum_{t=v}^{k}a_t$$

Loop

\begin{align} &\text{while }(u\lt v-1) \text{ and } (a_u+\text{tailsum} \lt A)\\ &\qquad u:=u+1\\ &\qquad\text{while } (u \lt v-1) \text{ and } (a_u+a_{v-1} \lt A)\\ &\qquad\qquad v:=v-1\\ &\qquad\qquad\text{tailsum}:=\text{tailsum}+a_v\\ \end{align}

Note tha $v$ ts decremented but never incremented in this loop. It can only be decremente $k$ times. So this block is executed in $O(k)$ time.

Decision

If we have now $$a_u+\text{tailsum}\lt A$$ then $u=v-1$. We haven't found a pair (u,v) such that $$a_u+\sum_{t=v}^k a_t<A$$ until now and we will not find one when we further decrease $u$ because this will decrease $v$, too, and therefore decrease the sum $a_u+\sum_{t=v}^k a_t.$ So no good set with three or more elements will exist and the algorithm will terminate here.

If the loop terminates with $a_u+\text{tailsum}\ge A$ then $${u, v, v+1, \ldots, k}$$ will have a good subset and this will have $3$ or more elements.

Construction of the good set

\begin{align} &\text{if } \text{tailsum} \lt A \text{ then}\\ &\qquad G:=\{u\}\\ &\qquad\text{sum}:=a_u+\text{tailsum}\\ &\text{else}\\ &\qquad G:=\{\}\\ &\qquad\text{sum}:=\text{tailsum}\\ &t=v\\ &\text{while } (t<=k)\\ &\qquad \text{if } \text{sum} - a_t \lt A \text{ then}\\ &\qquad\qquad G:=G\cup \{t\}\\ &\qquad\text{else}\\ &\qquad\qquad\text{sum}:=\text{sum}-a_t\\ &\qquad t=t+1\\ \end{align}

loop invariants: $$\sum_{t\in G}a_t+\sum_{t=v}^k a_t=\text{sum}$$ $$\sum_{t\in G}a_t+\sum_{t=v}^k a_t \ge A$$ $$\sum_{t\in G\setminus \{r\}}a_t+\sum_{t=v}^k a_t \lt A,\; \forall r \in G$$


def find_good(a):
  A=a[0]  # python lists start with index 0
  k=len(a)-1
  if k<3:
    return(None)
  u=1
  ## Initialize u
  while(a[u]+a[k-1]>=A) and (u<k-2):
    u=u+1
  if a[u]+a[k-1]>=A:
    return(None)

  ## Initialize v
  v=k-1
  tailsum=a[k-1]+a[k]
  while(a[u]+a[v-1]<A) and (u<v-1):
    v=v-1
    tailsum=tailsum+a[v]

  # loop
  while((u<v-1) and (a[u]+tailsum<A)):
    u=u+1 
    while((u<v-1) and (a[u]+a[v-1]<A)):
      v=v-1
      tailsum=tailsum+a[v]

  # decision
  if ((a[u]+tailsum)<A):
    # no solution exists
    return(None)

  # construction of a goot set:
  if (tailsum<A):
    G=[u]
    sum=a[u]+tailsum
  else:
    G=[]
    sum=0
  t=v
  while (t<=k):
    if sum-a[t]<A:
      G.append(t)
    else:
      sum=sum-a[t]
    t=t+1

  #prepare return value
  H=[a[k] for k in G] # H is a[g1], a[g2],...
  return(G, H)

Here is a link to the program: https://repl.it/@miracle173/findgood2