Minimum vector sets span spaces cover problem

1.6k Views Asked by At

Instance: a set of vectors $V=\{\mathbf{v}_1,\mathbf{v}_2,...,\mathbf{v}_n\}$ and $m$ vector sets $V_1,V_2,...,V_m$, each of which contains multiple vectors ($V_i$ may not be a subset of $V$). In our problem, the elements of each vector are integer numbers. The linear span space of $\bigcup\limits_{i=1}^mV_{i}$ contains each vector in $V$.

Question: find minimum number of sets from $\{V_1,V_2,...,V_m\}$, denoted as $\{V_{i_1},...V_{i_k}\}$, such that the linear span space of $\bigcup\limits_{j=1}^kV_{i_j}$ contains each vector in $V$.

There may exist a reduction to our problem from SET COVER, which is NP-complete.

I try to find an approximate algorithm to solve this problem with a provable approximation ratio.

One prossible algorithm is that at each step, we can remove one subset $V_i$ from $\{V_1,V_2,...,V_m\}$ under the condition that the span space of union of remainning subsets contains V. For this algorithm, the key issue is how to select a "good" $V_i$ at each step to make the final result is near to the optimal result. This approach may have no provable approximation ratio.

The worst case that can be used to verify the approximiation ratio of the approximation algorithm is shown as following:

$V_1=\{[1,0,\cdots,0]\}$, $V_2=\{[0,1,0,\cdots,0]\}$,...,$V_m=\{[0,0,\cdots,0,1]\}$, in which each set contains one row vector of a $m\times m$ identity matrix. $V={[0,0,\cdots,0,1,1]}$. Therefore, the optimal solution is $\{V_{m-1},V_{m}\}$ and the minimum number is 2.

If you have ideas and suggestions, it's pleasure if you can share with me.

Thanks very much!

3

There are 3 best solutions below

6
On

The problem can clearly be solved by an extensive search, and, in fact, a much faster algorithm can hardly be expected. Indeed, taking $v_1,\ldots,v_n$ to be a basis of $\mathbb{R}^n$ and $V_i\subset V$ gives a reduction to our problem from SET COVER, which is NP-complete.

On the other hand, there is a fast way to construct an inclusion-minimal subset of $V'=\{V_1,\ldots,V_m\}$ whose vectors span those from $V$. In fact, we can check, for any $i$, whether $V'\setminus\{V_i\}$ span vectors from $V$. If this is not the case, then $V'$ is minimal; otherwise, we conclude that vectors from $V'\setminus\{V_j\}$ span $V$, and we proceed by recursion.

3
On

Let $\mathcal{W}$ be the span of $V$. If $V$ is not independent, no harm in reducing it to an independent subset (i.e. a basis). You did not explicitly require $V_i \subset \mathcal{W}$ but that can be achieved by projection. And we can (then) make assume that each $V_i$ is independent.

The restricted case that all the $V_i \subset V$ does reduce it to set cover which is NP complete but gives an approximation by the greedy algorithm of always taking something which covers the most previously uncovered things. That gives at worst $\ln{s}+1$ times as many sets as the minimum where $s$ is the maximum of the $|V_i|$, and, the linked article says, in some sense one can't expect much better.

So what about the case that it is not required to be set cover? We certainly can't expect anything better than the restricted case. The following greedy strategy seems worth looking at, perhaps whatever the proofs are in the restricted case adapt:

Having made all the $V_i$ independent, take a largest one, say $V_m$. So keep $V_m$ for the cover. We know from linear algebra that there is a subset $V' \subset V$ with $V_m \cup V'$ a basis of $\mathcal{W}$. Replace $V$ with $V'$, $\mathcal{W}$ with $\mathcal{W}'$ spanned by $V'$ and each $V_i$ ( for $1 \le i \lt m$) by the projection $V'_i$ onto $\mathcal{W}'$ (reduced to an independent set.) repeat!

12
On

The reduction from set cover outlined in the other answer is approximation preserving. So it is NP-hard to approximate the optimal solution better than $\ln n$.

In the other direction I do not know how to handle the general case. However, if all vectors in all $V_i$ are contained in the span of $V$, and $V$ spans a subspace of dimension $d$, then a modification of the greedy algorithm gives a $\ln d$ approximation ratio. I give some details of the algorithm next.

Let $\cal V$ be the span of $V$. Since all $V_i$ are subsets of $\cal V$, we are looking for the smallest set $I$ such that $\bigcup_{i \in I}{V_i}$ contains a basis for $\cal V$. Let ${\cal V}_i$ be the span of $V_i$. Let $\cal W$ be the span of the vectors picked so far (if this is the first step of the algorithm, ${\cal W} = \{0\}$.) Pick the set $V_i$ that maximizes $$ f({\cal V}_i, {\cal W}) = \dim~({\cal W} \vee {\cal V}_i) - \dim \cal{W}. $$ Here $\vee$ is the subspace join, i.e. the span of the union.

The main observation in the analysis of the greedy algorithm for set cover still works in this situation. Assume, without loss of generality, that the optimal solution is the sets $V_1, \ldots, V_k$. Let ${\cal W}_t$ be the space spanned by the sets picked in the first $t$ steps of the algorithm. We have $$ {\cal V}_1\vee \ldots \vee {\cal V}_k = ({\cal V}_1 \vee {\cal W}_t) \vee \ldots \vee ({\cal V}_k \vee {\cal W}_t) = {\cal V} $$ The dimension is non-negative and submodular, and I claim that for any fixed ${\cal W}_t$, $f(\mathcal{U}, {\cal W}_t)$ is a non-negative submodular function of ${\cal U}$. Indeed, a constant shift of a submodular function is submodular, and non-negativity is trivial. Because a non-negative submodular function is sub-additive with respect to joins, we have $$ d - \dim \mathcal{W}_t = f(\mathcal{V}, \mathcal{W}_t) \leq \sum_{i = 1}^k{f({\cal V}_i, {\cal W}_t}). $$ Therefore, for some $i$, $f({\cal V}_i, {\cal W}_t) \geq \frac{d - \dim \mathcal{W}_t}{k}$. Setting $\mathcal{W}_{t+1} = {\cal V}_i \vee {\cal W}_t$, we have $d - \dim \mathcal{W}_{t+1} \leq (1-1/k)(d - \dim\mathcal{W}_t) \leq e^{-1/k}(d - \dim\mathcal{W}_t)$. So after $1 + k\ln d$ iterations, $\dim \mathcal{W}_t = d$ and we are done.

There is a common abstract generalization of set cover and your problem in terms of matroids. The abstract problem is, given subsets $S_1, \ldots, S_m$ of the ground set of a matroid, find the smallest set $J \subseteq \{1, \ldots, m\}$ so that $\bigcup_{j \in J}{S_j}$ contains a basis. The achievable approximation ratio is $\ln r$, where $r$ is the rank of the matroid. The matroid for set cover is the uniform matroid of rank $n$: any subset of the ground set is independent. One way to construct the matroid for the special case of the subspace cover problem above is as follows. The ground set is the union of the sets $V_i$, and a subset $U$ of the ground set is independent if the vectors in it are linearly independent. The greedy algorithm picks the set that increases the rank function the most, and the analysis is the same, using the fact that the rank function of a matroid is submodular and non-negative.