Why can't set cover be reduced to min-cost max-flow?

776 Views Asked by At

Okay, so I know obviously I'm making some kind of easy mistake here, since set cover is NP-complete and min-cost max-flow is in P, but I can't figure out what the mistake is.

So, given a universe $U$ and a set $S$ such that the union of all sets in $S$ is $U$, the set cover problem asks for the smallest subset of $S$ such that the union of all sets in that subset is $U$.

My question, then, is why we can't construct a min-cost max-flow graph as follows:

  1. Create one node for each set in $S$, and one node for each element in $U$.
  2. Draw edges from the source to each set $A \in S$ with 1 cost and infinite capacity.
  3. Draw edges from each element in $U$ to the sink with 0 cost and unit capacity.
  4. Draw edges from each set in $S$ to each of the elements in $U$ that it contains with 0 cost and infinite capacity.
  5. Find the min-cost max-flow of the resulting network; the sets $A \in S$ that have flow running through them should comprise a set cover.

Wikipedia provides an example problem with $U = \{1, 2, 3 ,4, 5\}$ and $S = \{ \{1, 2, 3\}, \{2, 4\}, \{3, 4\}, \{4, 5\} \}$ -- here is a picture I drew to illustrate.

The resulting graph will have $|S| + |U| + 2$ nodes, and $\Sigma_{A \in S} |A|+ |S| + |U| + 2$ edges, which I believe should be polynomial on the size of the input. So what am I doing wrong here? Thanks!

3

There are 3 best solutions below

0
On BEST ANSWER

Oops, I figured it out actually. This may have been what @Μάρκος Καραμέρης was trying to tell me, but I was misunderstanding the formulation of the min-cost max-flow problem. The cost of an edge is not a one-time cost to use that edge, but rather the cost per unit of flow.

2
On

To put it simply: your algorithm only guarantees to fill every node in the last(right) layer but it is not restricted anywhere on the number of subsets it uses from the first(left) layer. For example in the wikipedia case you mention it could use $\{1,2,3\}$ for the $1$, $\{2,4\}$ for the $4$ and $\{4,5\}$ for the $5$. Of course the $2$ can and should be obtained from the first set but this information is never encoded in the flow construction and that is the issue.

0
On

The max-flow min-cost problem allows real-valued solutions. Set cover demands binary "flow" choices, i.e. either a set from S is used in the solution or it isn't. Max-flow min-cost's acceptance of real-valued unknowns is a relaxation of the requirements of set cover, just as linear programming is a relaxation of integer linear programming, a relaxation that unfortunately changes the problem difficulty from NP-complete to P.