Is there an effective algorithm to solve this binary integer linear programming?

974 Views Asked by At

I am an applied math undergraduate student. On my project, I come across an integer linear programming question as follow:

Given $x_0,x_1,...,x_n$: $\forall$ i $\in$ [0,n], $x_i$ = 0 or 1

min Z = $\sum_{i=0}^n x_i $

with m numbers of constraints in the format $\sum x_j \ge 1$: j $\in$ [0,n].

I know this is generally a problem with NP hardness but I was wondering if there is an effective algorithm to solve this problem.

2

There are 2 best solutions below

8
On BEST ANSWER

The reduction of your problem to an instance of SetCover is perhaps clear to you. We will show that the reverse is also possible, to formulate any SetCover instance as a binary integer program of the kind you are asking about, and this will establish that the resulting minimization computation is NP-hard.

A good survey of approximate methods for such problems (including rounding from linear programming) is given by Fatema Akhter (2015), "A Heuristic Approach for Minimum Set Cover Problem", where a modified hill climbing algorithm is proposed and its performance compared on a library of SetCover problems.

First let's clear up the notation used for the constraints, because what was written in the Question: $$m \text{ numbers of constraints in the form } \sum x_j \ge 1: j \in [0,n]$$

does not adequately distinguish among different constraints. For each constraint there should be a subset $C_k \subseteq \{0,\ldots,n\}$ of indices over which the sum is taken:

$$ \sum_{j \in C_k} x_j \ge 1 $$

For convenience we will refer to this latter constraint by its index subset $C_k$, so that the collection of $m$ constraints is $\mathscr{U} = \{C_1,\ldots,C_m\}$.

This $\mathscr{U}$ will be the universe of our SetCover instance, and the family of sets from which a minimum cover will be found is $\mathscr{S} = \{S_0,S_1,\ldots,S_n\}$ where:

$$ S_i = \{ C_k : x_i \in C_k \} \text{ for } i=0,1,\ldots,n $$

Note that the sets $S_i$ are in one-to-one correspondence with your variables $x_i$. Let a minimum cover $\mathscr{S}' \subseteq \mathscr{S}$ of the universe be found, and set $x_i = 1$ if and only if $S_i \in \mathscr{S}'$, $x_i = 0$ otherwise. A little thought should convince us this assignment minimizes the objective:

$$ Z = \sum_{i=0}^n x_i $$

subject to the constraints $C_k$, which amount requiring each subset $C_k$ contains at least one variable $x_i$ which is assigned a value $1$.

In other words we have described how to choose which variables $x_i=1$ so that their sum $Z$ is minimized.

With a little further thought and explanation, this reduction can be reversed so than any SetCover instance can be expressed as a similar binary integer programming problem. Going in this direction we are given a finite universe $\mathscr{U}$ which we suggestively write as a set of $m$ points:

$$ \mathscr{U} = \{c_1,\ldots,c_m\} $$

and a finite nonempty family of subsets $\mathscr{S} = \{S_0,S_1,\ldots,S_n\}$ whose union is all of $\mathscr{U}$.

Define binary "indicator" variables $x_i \in \{0,1\}$ corresponding to the subsets $S_i$, so that for any choice of subfamily $\mathscr{S}' \subseteq \mathscr{S}$:

$$ x_i = \begin{cases} 1 & \text{if } S_i \in \mathscr{S}' \\ 0 & \text{otherwise} \end{cases} $$

Now as before we ask to minimize $Z = \sum_{i=0}^n x_i$ subject to the $m$ constraints:

$$ \sum_{i \in C_k} x_i \ge 1 $$

for $k = 1,\ldots,m$ where $C_k = \{ i : c_k \in S_i \}$.

A feasible solution $x_0,x_1,\ldots,x_n$ provides us a cover $\mathscr{S}'$ of the universe through the settings of indicator variables $x_i$. That is, since every constraint is satisfied, each $c_k \in \mathscr{U}$ will belong to at least one $S_i \in \mathscr{S}'$. A solution that minimizes $Z$ will at the same time minimize the size of $\mathscr{S}'$, since $Z = |\mathscr{S}'|$.


Having shown that your problem, at least in the generality described above, is NP-hard, it may be of interest to point out algorithms for approximate solutions to SetCover which produce in polynomial-time, covers that are within some factor of the true minimum cover size.

In the context of a universe of size $m$, the greedy algorithm (choose at each step a subset $S_i$ which contains the most points not yet covered) produces a solution whose size is at most $\log m$ times the true minimum size.

The Wikipedia article on SetCover which we linked above gives reference to a fairly recent paper by Dinur and Steurer (2013) which proves that polynomial-time algorithms cannot give much better approximation factors unless P = NP. This is the culmination of various investigations, showing "that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover to lower order terms."

13
On

Actually, this is not a difficult problem, it can be solved efficiently by interpreting it as a network flow problem. Suppose you have the following problem: $$ \min \; x_1+x_2+x_3 $$ subject to $$ x_1+x_2 \ge 1\\ x_1+x_3 \ge 1 \\ x_1,x_2,x_3 \in \{0,1\} $$

Now, create the following graph:enter image description here

Add a unitary cost on arcs $(s,1), (s,2), (s,3)$, and impose that the flow $f$ on all arcs ending in node $t$ is at least equal to $1$. Use any minimum cost flow algorithm to solve the problem and you are done.


EDIT $1$: There is a contradiction between this answer and @hardmath's answer: one of the two is not possible. The problem cannot be NP-hard and be solved in polynomial time. I suspect there is something wrong with my flow model, but I cannot figure out why. Ahuja and Magnanti (Networks Flows) have proposed shortest path algorithms for this problem if there are consecutive 1's in the columns of the matrix, but not for the general case. This is a hint that I believe something must be wrong with my flow model for the general case. Below is their network flow model if there are consecutive 1's in the columns of the matrix. See their book for more details.

enter image description here


EDIT $2$:

As @Erwin Kalvelagen pointed out, the above network flow model is wrong. The reason for that is that the inflow in node $s$ does not equal the objective function (the number of active variables). It counts the total number of times each variable is used, which is not what we want.

In summary:

  1. As proved below by @hardmath, the problem is NP-hard, so in general, no polynomial time algorithm can solve it (unless $P=NP$).
  2. You have two options left:
    • you can use approximation algorithms as mentioned by @hardmath
    • if your set covering problem has a particular type of structure, specifically, if all $1's$ are consecutive in the matrix columns, then the problem is no longer NP-hard and you can use the approach described above in Ahuja and Magnanti's book.